Data Contracts Need Relationship Context
Data Contracts Solve Half the Problem
Data contracts have become one of the most discussed concepts in modern data engineering. The idea is straightforward: producers of data define a contract that specifies the schema, data types, freshness guarantees, and quality expectations for their dataset. Consumers depend on that contract. Changes require negotiation.
This is a genuine improvement over the previous state of affairs, where schema changes propagated downstream without warning and broke dashboards, models, and pipelines. Contracts bring discipline and predictability to data handoffs.
But contracts, as typically implemented, cover individual datasets in isolation. They specify what a table promises about itself. They rarely specify what a table promises about its relationships to other tables.
What Contracts Cover Today
A typical data contract specifies:
These are all table-scoped. They describe properties of one dataset. This is useful and necessary, but it leaves a gap.
The Missing Piece
What contracts typically do not cover:
These are relationship-level properties. They exist at the intersection of two datasets, not within either one individually.
When Contracts Break at JOIN Boundaries
The limitation becomes visible when a contract-compliant change in one table breaks queries that JOIN it with another table.
Scenario: Team A owns the customers table. Their contract specifies that customer_id is a unique, non-null primary key. Team B owns the orders table. Their contract specifies that customer_id is a non-null foreign key column.
Team A decides to implement soft deletes. Instead of removing rows, they add a deleted_at column and filter deleted records. Their contract still holds. customer_id is still unique and non-null. But now the orders table references customer records that are logically deleted.
Any downstream query that JOINs orders with customers now includes deleted customers. The contract for each table is intact. The relationship between them is broken.
Without a contract that governs the relationship, this class of issue goes undetected until someone notices wrong numbers in a dashboard.
Relationship Contracts: What They Would Look Like
A relationship contract would specify the properties of a relationship between two datasets:
Referential integrity. Every value in the child column exists in the parent column. This is the foreign key guarantee, formalized as a contract term.
Cardinality. The relationship is one-to-many, one-to-one, or many-to-many. This affects how consumers can safely aggregate data after JOINs.
Consistency scope. The relationship holds for all records, only active records, only records within a date range, or some other defined subset.
Change protocol. If either side needs to make a change that affects the relationship (like adding soft deletes), both parties are notified and the impact on the relationship is assessed before the change ships.
Validation. Automated tests verify the relationship properties on a regular cadence, just as current contracts validate individual table properties.
This is not theoretical. It is how foreign key constraints work in operational databases, formalized for the modern data stack where constraints cannot be enforced at the database level.
Contract Enforcement With Discovery
The practical challenge with relationship contracts is knowing which relationships exist. You cannot write a contract for a relationship you do not know about.
This is where automated discovery becomes foundational. Before you can define relationship contracts, you need a complete inventory of relationships across your data platform. Many of these relationships are implicit, undocumented, and unknown to the teams that own the individual tables.
Discovery provides the inventory. For each discovered relationship, the discovery results include the tables involved, the columns that connect them, the cardinality, and a confidence score. This is the raw material from which relationship contracts can be written.
The workflow becomes:
Data contracts were a step forward from no contracts. Relationship contracts are the next step. The relationships between your datasets are just as important as the datasets themselves, and they deserve the same level of formal specification and automated validation.