A Perspective on Speed and State

Introduction

On March 19, 2025, dbt Labs hosted dbt Developer Day 2025 and announced features for dbt Core. Highlighted announcements included a new dbt Core 1.10 (beta) with a “Sample Mode” for quicker development runs, a SQL parser in a new dbt engine, and an official VS Code extension. As the creators of SQLMesh, we’re happy to see attention on developer experience and speed. In this post, we share our viewpoint on how SQLMesh’s stateful approach compares to dbt’s stateless approach, and why state is key to faster, more accurate data pipelines.

We’ll explore the differences between dbt Core (stateless) and SQLMesh (stateful), and discuss:

  • What stateless vs. stateful mean for data pipelines’ speed, accuracy, and developer experience.

  • Why dbt’s new Sample Mode (using only a subset of data) isn’t truly like working with full production data, and how SQLMesh’s Virtual Environments let you safely use complete data for development.

  • Why parsing code faster is nice, but compute is the real cost – or as we say, “A GPS doesn’t make your car go faster.”

  • How SQLMesh’s stateful architecture enables more robust testing, reproducibility, and easy promotion of changes to production.

Let’s dive in!

Stateless vs. Stateful – What’s the Difference?

First, let’s clarify these terms. dbt Core is stateless, while SQLMesh is stateful. This sounds technical, but the idea is simple. A stateless system does not remember past runs or results. Each time you run dbt, it looks at your code and data sources fresh, without built-in memory of what happened before. In contrast, a stateful system like SQLMesh remembers what has already been done – it keeps track of the state of your data pipeline over time.

Imagine a stateless approach like a person who, every day, forgets they already cleaned their room and cleans it all over again. A stateful approach is like someone who remembers what’s already clean and only tidies up any new messes. dbt’s stateless design means reprocessing a lot of data even if nothing has changed, whereas SQLMesh’s stateful design means only processing what’s needed. This has big implications for speed, efficiency, and cost.

Speed and Efficiency: Because dbt doesn’t inherently track state, the common way to develop or test changes is to rebuild tables from scratch in a new environment (like a dev schema) or do a full refresh. This can be time-consuming and costly, especially as your data and number of models grow. It’s like tearing down a house and rebuilding it just to fix a light bulb – doable at a small scale but inefficient as the house (or dataset) gets bigger (dbt + SDF: What Changes and What Doesn't). dbt introduced incremental models to mitigate this, but without a built-in state, users have to write custom logic (with Jinja templates and is_incremental() checks) to manage which data to update. This approach works, but it burdens the developer to figure out things like date cutoffs or missing segments manually. Mistakes can happen, and it’s “messy and error-prone” as data grows.

SQLMesh takes a different approach. Since SQLMesh is stateful, it knows what data has already been processed and what’s new or changed. You don’t have to code those checks manually – the system handles it. When you modify a model, SQLMesh automatically identifies exactly which tables or partitions are affected. It then recomputes only those specific parts rather than rebuilding the entire dataset. In practical terms, this results in a much faster edit-run-check loop for developers. You avoid redundant work because SQLMesh’s stateful design ensures you're only waiting for the updated portions, significantly accelerating development cycles and reducing costs.

Accuracy and Consistency: Being stateless can also affect accuracy. Ensuring you have all the correct data can be tricky if each run is independent. For example, in dbt incremental models, if you don’t manage the state correctly, you might miss late-arriving data or accidentally reprocess data twice. A stateful system like SQLMesh keeps track of what data has been added and when it was added, to prevent duplicate processing and handle late data gracefully. The result is a pipeline that’s not only faster but also more reliable – you’re less likely to miss data or introduce errors when catching up on new data. In a stateless tool, the user must implement these safeguards themselves.

Developer Experience: From a developer’s perspective, statefulness can make life easier. With dbt, advanced use cases (like backfilling a specific time range or creating a test environment) often require careful manual steps and knowledge of internal details. SQLMesh was designed to streamline this. Inspired by DevOps tools like Terraform (which tracks infrastructure state), SQLMesh brings that idea to data. Developers can think in higher-level terms (“I want to add this new column and backfill last month’s data”) and trust SQLMesh to figure out the low-level steps because it remembers the pipeline’s history. There’s less tribal knowledge needed and fewer “gotchas.” Stateful means the tool does more of the work for you so that you can focus on business logic instead of pipeline plumbing.

Working with Full Data vs. Samples (dbt’s Sample Mode vs. SQLMesh Virtual Environments)

One of the headline features announced in dbt Core 1.10 is Sample Mode. This is an option to make development and testing faster by building just a subset of your data instead of the entire dataset. For example, you could tell dbt: “Only process the last 3 days of data” or “Only process data from January 2025” when running models during development or continuous integration tests. By working on a small sample (especially for large time-based datasets), the idea is that you get quicker feedback and use less warehouse compute, saving time and money.

By explicitly limiting the data, developers no longer need to wait for the whole dataset to be built just to see if their code works. It’s an improvement for dbt users dealing with huge tables. However, Sample Mode does not use the full data, so it isn’t truly representative of production. It’s like testing a new recipe but only cooking a tiny portion of it – you get a taste, but the full dish might have surprises. If there are anomalies or edge cases outside the sampled window, you might miss them in testing. Something that works fine on 3 days of data could still fail or produce incorrect results when you run it on 3 years of data. In addition, performance characteristics can differ. A fast query on a small sample might slow down significantly on the full dataset. While sampling helps speed up development, it trades off realism – you’re not seeing the full picture of your data.

SQLMesh allows developers to work with full production data in a non-disruptive way. How can you use all the production data without slowing down or interfering with the business? The answer is SQLMesh’s Virtual Environments. A virtual environment in SQLMesh is like a sandbox copy of your data pipeline that you can use for development or testing without copying all the data. SQLMesh can create a new environment almost instantly because it doesn’t duplicate the entire warehouse. Instead, it creates views or references to the production data (Comparisons - SQLMesh). It’s as if you have a mirror of your production tables – you see all the real data and can run transformations on it, but your work stays in the mirror (isolated) until you’re ready to apply it. This means you can run your exact pipeline logic on full datasets and real production records, catching any issues that only appear with complete data. You get the best of both worlds: realism (because it’s the full data) and safety (because it’s not affecting the actual production outputs).

Let’s illustrate the difference: With dbt’s Sample Mode, if you’re developing a new weekly sales report model, you might only build the last two weeks of data to test it quickly. You can verify the model runs and outputs data for those two weeks. But perhaps there’s a problem in week 3 that you won’t notice until you run on a more extensive date range later – maybe a particular product category appears that isn’t in the sample, causing an error or a weird result. With SQLMesh, you could create a development environment and run the sales report for the entire year’s data. It might take longer than a two-week sample, but thanks to SQLMesh’s incremental execution and state awareness, it will still be efficient. Crucially, you’ll see the output on the full year of data in your dev environment. If something is going to break in production, it likely breaks in your test, too – so you catch it early. And while you do this, production remains safe: your dev runs don’t touch the live tables. SQLMesh’s virtual environments give you confidence that what you test is what you’ll get in production because you tested on the real thing, not a slice.

Another benefit is precision in identifying time frames. SQLMesh can let you specify or detect exactly which date partitions or intervals need to be processed when you make a change. For example, if you change a model that only affects data from last month onward, SQLMesh can automatically figure that out and only recompute that timeframe in your dev environment. This precise control means you’re not limited to a generic “last N days” sample – you can be efficient and accurate in what you choose to build or rebuild. In summary, while dbt’s Sample Mode offers a speed boost by shrinking the task, SQLMesh’s virtual environments offer speed through smart execution while using complete data. We don’t have to choose between speed and accuracy – we get both.

Parsing vs. Computing – A GPS Doesn’t Make Your Car Go Faster

Another big announcement from dbt Developer Day was integrating a new SQL parsing engine (from their SDF acquisition) into dbt Core. This new engine gives faster parse times, with one demo showing a 10,000-model project parsing in under 1 second. That’s fast! Parsing, in this context, means how quickly dbt can read and understand your project’s SQL and configurations (without actually running the data pipeline). We applaud improvements that improve the developer’s experience. 

However, it’s important to put this in perspective. Parsing and compilation speed is only a tiny fraction of the total pipeline time and cost. The heavy lifting in any data pipeline is the actual execution of queries on your data warehouse – in other words, the compute. If your transformations have to crunch through billions of rows, that’s where the time (and cloud warehouse credits) go. By analogy: “A GPS doesn’t make your car go faster.” A better navigation system (faster parsing) helps you plan your route quickly and avoid wrong turns, but it doesn’t increase the horsepower of your car’s engine (dbt + SDF: What Changes and What Doesn't). A query that takes 10 minutes to run on your database will still take 10 minutes, even if your tool parses the project in 1 second instead of 10 seconds. The overall pipeline might go from 10 minutes 10 seconds to 10 minutes 1 second – a nice improvement, but not a game-changer for the end-to-end speed.

Why does this distinction matter? It matters because focusing on parse speed alone can give a false impression of a dramatically faster pipeline. But that doesn’t mean your warehouse workloads are 100x faster. If you run a full refresh of a large project, you might barely notice the parsing time; the database work overshadows it. Warehouse compute is where most of the cost is – vendors charge based on bytes processed or CPU time running queries, not how quickly your local tool can parse SQL files. Reducing parse time has a negligible impact on your cloud bill compared to not rerunning a heavy query unnecessarily.

This is why SQLMesh’s stateful approach focuses on optimizing the compute side. By remembering what’s been done, SQLMesh avoids re-running expensive queries if they are not needed. For example, if a model hasn’t changed and its source data hasn’t changed, SQLMesh can skip re-computing it and use the previously computed result. That’s like not driving down a road you’ve already covered. This can save minutes or hours of run time, whereas shaving seconds off parse time is nice but minor. To be clear, we aren’t saying parsing speed has no value – it does improve the developer experience when editing and testing models (and SQLMesh also parses SQL efficiently using our own engine, SQLGlot). To optimize pipeline execution speed and cost, reducing the amount of data scanned and processed is what has the most impact. A faster parser is great, but a more intelligent pipeline is even better. Think of it this way: a good GPS (parser) will plot the quickest route, but if you can also avoid unnecessary trips, you save much more time and fuel.

Robust Testing, Reproducibility, and Easy Promotions with Stateful Architecture

The most important advantage of a stateful architecture is how it improves testing and releasing changes in a data pipeline. In software development, we take for granted tools that help with testing and deploying code safely (like version control, staging environments, and so on). SQLMesh brings many of those best practices to data pipelines through its design.

Because SQLMesh creates virtual data  environments efficiently and quickly, it’s easy to set up a staging area for any change. You can develop a change in a dev environment (with full data as we discussed), then promote that environment to staging for further testing or for a teammate to review. Each environment (dev, staging, prod) has the pipeline code and the data state that goes with it. This means tests can be run on a staging environment that is identical to what production will be, down to the data. If you have data quality checks or assertions, you run them in staging, knowing the data is the real deal, not a sample. This catches issues that could have slipped by otherwise. It also boosts confidence: when tests pass in SQLMesh staging, you can trust that production will see the same results because production will get the exact same calculations.

Reproducibility is another benefit. Since SQLMesh keeps track of state and data versions, you can recreate any past pipeline state if needed. For example, an issue was discovered in a report generated last week. With SQLMesh, you could spin up an environment using last week’s code and data pointers to investigate exactly what happened. It’s like a time machine for your data pipeline. You might have the old code in a stateless system, but re-running it today could yield different results if the underlying data changed or if you can’t easily get the historical data in the same shape. SQLMesh’s approach ensures you can reproduce and verify past results when needed, which is crucial for debugging and auditing.

Regarding promotion to production – when you take a tested change and make it live for your end users. In dbt, there isn’t a built-in concept of “promoting” an environment; typically, you’d just run the models in the production schema after testing. But because dbt is stateless, you’re re-running all the necessary queries on production when you deploy. Even if you just tested everything in staging, you have to do it again, spending time and compute, and hoping nothing new goes wrong. There’s also a chance that between your test and production run, some new data arrived or something changed, which could introduce differences. The production run is a fresh action that could potentially diverge from what you tested.

SQLMesh handles promotion more predictably and efficiently. Since a stateful system knows the results of your staging environment, it can promote those results to production without re-computation. In SQLMesh, promotion is basically a pointer swap (Comparisons - SQLMesh). The data produced in staging is now designated as the official production output. This can be almost instantaneous and uses no extra compute power because you’re not re-running all the transformations – you’re just saying,“Okay, these tables that staging was writing to are now the official ones that prod should read from.” It’s like flipping a switch. The version you tested becomes the live version. If something goes wrong and you need to rollback, that’s a switch to the previous state, not a full rerun rollback. This approach makes deployments faster and far less risky.

A stateful architecture gives you DevOps superpowers for data pipelines. You get easy testing environments, the ability to compare outputs between versions (SQLMesh even has a table diff feature to show how data changed between one version and another), and confidence that you can promote changes safely, knowing exactly what data is going live. All of this contributes to higher quality and reliability. You can move fast an utilize guardrails. Conversely, a stateless setup requires more manual work to achieve similar safety. Teams often have to implement custom scripts or processes to mimic environment management, or they take the risk of quick deployments without full production-data testing. The stateful method is a more robust and scalable solution as your team and data grow.

Conclusion

dbt’s Developer Day 2025 announcements underline a clear theme: speeding up data development while keeping quality high. As long-time believers in those same goals, we’re excited to see these improvements in the data tooling ecosystem. Features like Sample Mode and a faster parser in dbt Core 1.10 show that dbt addresses some of the pain points analytics engineers have faced. However, the core architecture differences between dbt and SQLMesh lead to different solutions to these problems. SQLMesh’s stateful design tackles speed and quality at the root by remembering and leveraging the state of your data rather than working around the lack of it.

At Tobiko we believe that stateful data pipelines are the future of data engineering. We built SQLMesh to embody that philosophy, which has paid off in terms of performance and reliability gains for our users. The announcements from dbt Labs validate the importance of developer experience and speed, and we’re proud to already offer those advantages through a more holistic, state-aware solution. Ultimately, dbt and SQLMesh share the same goal: to help data teams deliver correct data faster. The difference lies in how we get there. SQLMesh can optimize your data pipelines at scale by taking a state-aware approach.

{{banner-slack}}