DagsterData PipelinesOrchestrationdbtSoftware-Defined Assets

Dagster in Production: Assets, Partitions, and Why Modern Data Teams Are Moving Beyond Airflow

November 3, 2025 · 9 min read

Every data platform needs an orchestrator, and for years Airflow was the only serious option. It defined the category. But Airflow was designed around a simple idea: define tasks, wire them into a DAG, and schedule the DAG. That model works until it does not. At scale, task-centric orchestration creates blind spots. You know what ran and when, but you do not always know what data you have, whether it is fresh, or how it connects to the rest of your platform.

Dagster takes a different approach. Instead of defining tasks, you define the data assets your platform produces. The orchestrator then figures out what to run, when to run it, and whether the result is correct. This shift from “what do I run” to “what do I have” changes how teams build, test, and trust their data infrastructure.

The Problem with Task-Centric Orchestration

Airflow models pipelines as directed acyclic graphs of tasks. Each task is a unit of work: run a query, call an API, move a file. Dependencies connect tasks so they execute in order. This is intuitive and flexible, but it has real limitations at scale.

Testing is difficult because tasks are side effects. You cannot easily run a task in isolation without setting up the entire execution context. Backfills require re-running entire DAGs or writing custom logic to target specific date ranges. There is no native concept of data lineage, so understanding what a downstream dashboard actually depends on means reading code or maintaining external documentation.

The deeper problem is that the DAG describes work, not data. When something breaks, you know which task failed. But answering “is the revenue table fresh?” or “what would happen if I changed this source schema?” requires stitching together information from multiple systems. Dagster was built to solve exactly this gap.

Enter Software-Defined Assets

The core concept in Dagster is the Software-Defined Asset (SDA). An asset is a persistent object in your data platform: a table, a file, a model, a metric. Instead of writing a task that produces data as a side effect, you declare the asset itself and define how to compute it.

This is a fundamental shift. When you define an asset, Dagster knows what your platform produces, what each asset depends on, and when each asset was last computed. The asset graph replaces the task DAG as the primary abstraction, and it maps directly to the artifacts your stakeholders actually care about.

Here is a simple asset definition:

from dagster import asset

@asset
def daily_revenue(orders: pd.DataFrame) -> pd.DataFrame:
    """Compute daily revenue from the orders table.
    Groups by date and sums the order totals.
    """
    return (
        orders
        .groupby(orders["order_date"].dt.date)
        .agg(revenue=("order_total", "sum"))
        .reset_index()
    )

The @asset decorator tells Dagster this function produces a named data asset. The parameter “orders” declares a dependency on another asset. Dagster builds the dependency graph automatically, tracks materializations, and gives you a full picture of what data exists and when it was computed.

Partitions: Incremental Processing Done Right

Anyone who has wrestled with Airflow's execution_date knows the pain. Airflow ties execution to schedule intervals, and reasoning about which data a run processes requires understanding a model that confuses even experienced engineers.

Dagster's partition system is explicit. You define partition keys (daily, monthly, or custom ranges), and each partition represents a slice of data that can be materialized independently. Backfills are a first-class operation: select a range of partitions in the UI, click run, and Dagster handles the rest.

from dagster import asset, DailyPartitionsDefinition

daily_partitions = DailyPartitionsDefinition(start_date="2024-01-01")

@asset(partitions_def=daily_partitions)
def daily_orders(context) -> pd.DataFrame:
    """Load orders for a single day partition."""
    partition_date = context.partition_key
    return load_orders_for_date(partition_date)

This makes incremental processing predictable. Each partition has its own materialization status, so you always know which slices are fresh and which need recomputation. No more guessing whether yesterday's backfill actually caught everything.

Asset Checks and Data Quality

Dagster builds data quality into the orchestration layer. Asset checks are assertions attached directly to assets. They run after materialization and validate that the data meets expectations: row counts, null percentages, value ranges, freshness thresholds.

This is cleaner than running Great Expectations as a separate pipeline step. The checks live alongside the asset definition, they appear in the Dagster UI with pass or fail status, and failures can trigger alerts or block downstream materializations. Freshness checks are particularly useful. You can define how stale an asset is allowed to be, and Dagster will flag assets that have not been refreshed within the window.

Dagster + dbt: The Perfect Pairing

The dagster-dbt integration is one of the strongest arguments for adopting Dagster. Every dbt model becomes a Dagster asset automatically. You get a unified lineage graph that spans Python assets, dbt SQL models, and external sources, all in one place.

This means your analytics engineering team gets full visibility into the upstream dependencies of their dbt models. When a Python ingestion asset fails, the dbt models that depend on it show as stale in the Dagster UI. Cross-language lineage is rare in the data tooling ecosystem, and it solves a real coordination problem between data engineers and analytics engineers.

The integration also brings Dagster's partition system to dbt. You can partition dbt models by date, run incremental materializations through Dagster's scheduler, and backfill dbt models using the same UI you use for Python assets. For teams running dbt in production, this combination is quickly becoming the gold standard.

Dagster vs Airflow: When to Choose Each

Choose Dagster for greenfield projects where you can build around asset-centric thinking from day one. It excels on teams that are dbt-heavy, need built-in lineage and observability, or want a modern developer experience with type checking and testability.

Choose Airflow when you have a large existing investment in Airflow DAGs that work and do not need refactoring. Airflow's plugin ecosystem is massive, and for ML pipeline workflows, tools like Airflow + MLflow have mature integrations. If your team already knows Airflow well and your orchestration needs are straightforward, switching has a real cost.

The honest answer: most teams starting fresh in 2026 should evaluate Dagster first. The asset model is a better abstraction for data platforms, and the developer experience is significantly ahead.

Branch Deployments and Staging Environments

Dagster Cloud offers branch deployments, which give you isolated staging environments for every pull request. When you open a PR that changes an asset definition, Dagster spins up a deployment with your changes so you can test materializations before merging to production.

This matters because data pipeline changes are notoriously hard to test. You cannot easily spin up a staging Airflow with realistic data. Branch deployments solve this by letting you validate pipeline logic, check asset dependencies, and catch breaking changes before they hit production. For teams that care about data quality, this feature alone can justify the migration.

Closing

Dagster is not just another orchestrator. It represents a genuine shift in how data teams think about pipelines, from sequences of tasks to graphs of data assets. The combination of Software-Defined Assets, native partitioning, built-in data quality, and first-class dbt integration makes it the most compelling orchestration platform for data engineering teams in 2026.

If you are building or evaluating your data platform's orchestration layer, invest the time to learn Dagster deeply. The asset model will change how you think about your entire stack. Check out my data pipeline project on GitHub for a working example of Dagster, dbt, and DuckDB running together.

Questions or pushback on any of this? Find me on LinkedIn.

Ryan Kirsch is a senior data engineer with 8+ years building data infrastructure at media, SaaS, and fintech companies. He specializes in Kafka, dbt, Snowflake, and Spark, and writes about data engineering patterns from production experience. See his full portfolio.