dbt in Production: Testing, CI/CD, and the Medallion Architecture

dbt is deceptively approachable. You can start with a single model, rundbt run, and feel productive in an afternoon. That first win is important. It is also where most teams stop. They ship a few models, then add a few more, and suddenly the folder is full of SQL files that no one really owns. When the data quality drifts or a metric breaks, the team blames upstream sources, reruns jobs, and moves on.

The hard lesson from production is that SQL files are the least important part of a dbt platform. The real value comes from disciplined structure, automated tests, and a delivery workflow that treats models like code. That is what makes dbt scale across teams and time. In the rest of this post, I will walk through the patterns I use to keep a dbt project production-grade, including how we ran it for that 1.5M+ subscriber news publisher without the analytics team losing trust.

The Medallion Architecture With dbt

The medallion architecture is not a buzzword. It is the simplest mental model that keeps analytics sane. Each layer has a purpose. Bronze is raw and minimally cleaned. Silver is standardized and ready for reuse. Gold is curated, business-facing data that drives dashboards and product decisions.

In dbt terms, that typically maps to staging, intermediate, and marts. I treat staging models as the Bronze to Silver boundary. They rename columns, standardize types, and make the raw data consistent. Intermediate models are where I centralize business logic, joins, and calculations. Those models are still internal, but they are reusable building blocks. Finally, marts are Gold models designed for specific analytics use cases. If a model is directly queried by an analyst or a BI tool, it belongs in the mart layer.

This separation matters because it makes change safe. When a source field shifts, I update the staging model and keep the rest of the stack stable. When a metric definition evolves, I change an intermediate model and propagate it to the marts without rewriting every dashboard. That is how you keep a platform reliable across years of changes, not just weeks.

models/
  staging/
    stg_users.sql
    stg_subscriptions.sql
  intermediate/
    int_user_lifecycle.sql
    int_subscription_status.sql
  marts/
    mart_user_growth.sql
    mart_revenue_retention.sql

At the news publisher, this structure let us scale fast. Editorial analytics, subscription growth, and ad performance all had different needs. By anchoring everything in shared staging and intermediate models, we reduced duplicate logic and made the final marts more trustworthy.

Data Testing Is the CI Gate

dbt ships with a testing framework that is easy to ignore and painful to skip. The basics are powerful: not_null,unique,accepted_values, andrelationships. Those four cover the majority of real data quality failures I see in production.

The moment you move beyond the basics, custom generic tests become essential. I lean on dbt-expectations for common patterns like row count comparisons, percent thresholds, and column type enforcement. I also write a few in-house generic tests for business-specific rules. The key is that tests live next to the models and run automatically in CI. If a test fails, the PR does not merge. That gate is what separates a dbt platform from a loose collection of SQL files.

Here is a real example from a user mart model. We had an email dimension that marketing and growth both used, and duplicates caused real campaign errors. The solution was a simple uniqueness test, but the impact was massive because it ran every time anyone touched the model.

version: 2

models:
  - name: mart_users
    description: "Analytics-ready user dimension"
    columns:
      - name: user_id
        tests:
          - not_null
          - unique
      - name: email_address
        tests:
          - not_null
          - unique
      - name: subscription_status
        tests:
          - accepted_values:
              values: ["active", "canceled", "trial", "expired"]

At the 1.5M+ subscriber publisher, we treated tests as a contract. If a test failed, the on-call data engineer investigated before analytics teams felt the issue. That discipline changed how much the business trusted data, which is the real KPI for any data platform.

CI/CD for dbt

dbt makes local development easy, but it is useless without a consistent delivery pipeline. My default setup is GitHub Actions on every pull request. The workflow runs dbt buildfor modified models, then dbt test, thendbt docs generate to keep documentation and lineage current. This is the basic CI contract for dbt.

To keep CI fast, I use the slim CI pattern with thestate:modified+ selector. That means we only build models that changed and their downstream dependencies. It keeps PR feedback fast without sacrificing coverage.

For deployments, I prefer branch per environment. Main merges deploy to production. A long-lived develop orstaging branch deploys to a staging warehouse. Feature branches run CI only. This mirrors how application engineering teams ship code, and it works just as well for analytics.

name: dbt-ci

on:
  pull_request:
    branches: ["main"]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install deps
        run: |
          pip install dbt-core dbt-snowflake
      - name: dbt build (slim CI)
        run: |
          dbt build --select state:modified+ --defer --state ./target
      - name: dbt test
        run: |
          dbt test --select state:modified+ --defer --state ./target
      - name: dbt docs generate
        run: |
          dbt docs generate

At the publisher, this workflow meant a model change could not land without tests, and everyone knew it. It is one of the highest leverage changes you can make to a dbt project.

Incremental Models Done Right

Incremental models are a blessing and a trap. The macrois_incremental() makes it easy to filter for new records, but it does not protect you from late arriving data, updated records, or silent duplication. If you are not careful, you will miss changes and never notice.

The fix is a lookback window and a deterministic unique key. Every incremental model should reprocess a small slice of recent history. For example, we often reprocess the last three days of data, then upsert by a stable unique key. That catches late arriving events without forcing a full refresh. It also keeps the model idempotent, which is the real goal.

When I built subscriber revenue models at the news publisher, a lookback window was essential. Subscription changes arrive late, and refunds can show up days after the initial transaction. The model needed to stay correct, not just fast.

{{
  config(
    materialized='incremental',
    unique_key='subscription_event_id'
  )
}}

with source as (
  select
    subscription_event_id,
    user_id,
    event_type,
    occurred_at,
    amount_cents
  from {{ ref('stg_subscription_events') }}
  {% if is_incremental() %}
    where occurred_at >= dateadd(day, -3, current_timestamp)
  {% endif %}
)

select * from source

This pattern is simple, but it avoids the most common incremental bug I see: silently missing updates. The cost of reprocessing a few days is trivial compared to the cost of bad revenue data.

Monitoring and Observability

dbt artifacts are a foundation for observability. The manifest, run results, and catalog files already contain everything you need to build lineage views, freshness checks, and model-level monitoring. Many teams ignore them, which is a missed opportunity.

In practice, I pair dbt artifacts with a tool like Elementary or re_data for anomaly detection. They read dbt metadata, track row counts and distribution shifts, and alert when something changes. The result is a monitoring layer that is tied directly to your dbt models, not a separate data quality system that no one remembers to maintain.

For the 1.5M+ subscriber publisher, this was the difference between reactive and proactive. When a source table dropped 30 percent of its daily volume, we caught it before the morning dashboards went live. That is the level of reliability a production data platform needs.

What Makes dbt Scale

dbt scales when you treat it like software, not a pile of SQL. The medallion architecture creates separation of concerns. Tests encode data contracts. CI/CD enforces those contracts. Incremental models respect reality instead of assuming sources are perfect. Monitoring closes the loop so you can trust what ships.

That is the governance layer. It is not glamorous, but it is what makes a dbt project a dbt platform. At the news publisher, that governance turned 1.5M+ subscriber analytics into something people trusted enough to make revenue decisions on. If you want dbt to scale in your organization, this is the work that matters.

Questions or pushback on any of this? Find me on LinkedIn.

Share on X Share on LinkedIn

Ryan Kirsch is a senior data engineer with 8+ years building data infrastructure at media, SaaS, and fintech companies. He specializes in Kafka, dbt, Snowflake, and Airflow, and writes about data engineering patterns from production experience. See his full portfolio.