The dbt Relationships Test is Not Enough
If you use dbt, you probably have some relationships tests in your YAML. They look like this:
models:
- name: orders
columns:
- name: user_id
tests:
- relationships:
to: ref('users')
field: id
This test verifies that every user_id in the orders table exists in the users table. It catches orphaned references, which is genuinely useful. The problem is what it does not catch.
What the Test Actually Does
The dbt relationships test runs a query that looks roughly like:
SELECT COUNT(*) FROM orders WHERE user_id NOT IN (SELECT id FROM users)
If the count is zero, the test passes. If any user_id values in orders do not have a matching id in users, the test fails. Simple, reliable, useful.
What It Misses
Relationships You Did Not Define
The test only runs on relationships you explicitly configure. If your orders table has a promo_code column that references a promotions table, but nobody added that relationship test, it does not get checked. The test cannot find relationships it does not know about.
This is the fundamental limitation. You are testing the relationships you already documented, which are the ones you were least likely to get wrong. The dangerous relationships are the undocumented ones.
Relationships Across Sources
dbt relationships tests work within your dbt project. If your staging models reference source tables that live outside your dbt project, or if relationships span multiple dbt projects, testing gets complicated. You can test against sources, but in practice teams rarely achieve complete coverage.
Cardinality Issues
The relationships test confirms referential integrity (every child value exists in the parent). It does not check cardinality. If your data model assumes a one-to-one relationship between users and profiles, but the profiles table actually has duplicates, the relationships test passes while your JOINs silently produce wrong results.
Stale Definitions
As schemas evolve, new columns get added that establish new relationships. The dbt YAML does not update itself. Unless someone manually adds a new relationships test every time a column is added, coverage degrades over time.
Closing the Gap
Automated Discovery
Run a relationship discovery tool against your database periodically. Compare discovered relationships against your dbt YAML. Any relationship that exists in the database but is not tested in dbt is a gap.
Squish does this automatically. Connect your database, run discovery, and export the results. Diff them against your existing dbt YAML to find untested relationships.
Cardinality Tests
Supplement relationships tests with cardinality assertions. dbt-utils and dbt-expectations both provide tests for uniqueness and accepted values that can catch cardinality violations.
For critical one-to-one relationships, add an explicit uniqueness test on the child column:
models:
- name: user_profiles
columns:
- name: user_id
tests:
- unique
- relationships:
to: ref('users')
field: id
The relationships test confirms every user_id exists in users. The unique test confirms there is only one profile per user. Together they enforce a true one-to-one relationship.
Coverage Tracking
Track what percentage of your columns with _id or _key suffixes have corresponding relationships tests. This is a rough but useful proxy for relationship test coverage. If you have 80 columns ending in _id and only 30 relationships tests, you have gaps.
Source Freshness as a Proxy
If a source table changes schema (new columns added, columns renamed), that is a signal that your relationship definitions may need updating. Pair source freshness checks with periodic schema diffing to catch drift.
A Practical Workflow
The dbt relationships test is useful but not complete. Pair it with automated discovery for more comprehensive relationship validation.