dbt (data build tool) is the standard framework for transforming raw data in your warehouse into clean, tested, documented analytics models. It brings software engineering practices — version control, testing, documentation, and CI/CD — to data transformation, and is used by virtually every data team operating a modern data stack.
Product Overview
dbt works by running SQL SELECT statements that define how raw source data should be transformed, and it manages the DAG (directed acyclic graph) of dependencies between models. Teams write modular SQL, test data quality, and generate documentation automatically. dbt Cloud adds a hosted IDE, scheduler, and observability layer.
Key Features
- SQL-first Transformations: Define data models as SELECT statements — dbt handles materialisation and dependencies.
- Data Testing: Built-in tests for not-null, uniqueness, referential integrity, and custom conditions.
- Auto Documentation: Generates a searchable data catalogue from model and column descriptions.
- DAG Lineage: Visual lineage graph showing how every model relates to source data and downstream reports.
- dbt Cloud: Hosted IDE, job scheduler, alerts, and CI/CD for production dbt workflows.
Best For
Data engineering teams building a modern analytics stack that want consistent, tested, documented data transformations in Snowflake, BigQuery, Redshift, or similar.
Pricing
dbt Core: open source (free). dbt Cloud: Developer free; Team at $100/seat/month; Enterprise custom.
Key Integrations
Snowflake, BigQuery, Redshift, Databricks, DuckDB, GitHub, Looker, Tableau
Pros
- Industry standard — huge community
- Brings engineering rigour to analytics
- Excellent documentation generation
- Open source core
Cons
- Requires SQL and data engineering knowledge
- Not a no-code tool
- dbt Cloud pricing scales steeply