What Happens to Foreign Keys When You Migrate to a Data Warehouse

The Migration Nobody Talks About

Every company that adopts a data warehouse goes through the same migration. Data moves from PostgreSQL, MySQL, or another operational database into Snowflake, BigQuery, or Redshift through an ETL or ELT pipeline. Tables arrive. Columns arrive. Rows arrive. Foreign key constraints do not.

This is not a bug. It is by design. But the consequences are rarely discussed until an analyst writes a JOIN that silently produces wrong results because the relationship it depends on was never carried over.

How Foreign Keys Work in Operational Databases

In PostgreSQL or MySQL, a foreign key constraint is an explicit declaration: this column in this table references that column in that table. The database enforces it. If you try to insert an order with a customer_id that does not exist in the customers table, the database rejects the INSERT.

This enforcement does two things. It prevents bad data from entering the system. And it documents the relationship in the schema itself. Anyone who queries information_schema can see exactly how tables relate to each other.

A typical operational database might have dozens of these constraints. They form a graph of relationships that describes the entire data model.

What ETL Pipelines Actually Copy

When Fivetran, Airbyte, or a custom pipeline copies data from PostgreSQL to Snowflake, it copies the structure and the data. Tables are created. Columns are mapped. Rows are loaded. But the foreign key constraints are stripped out.

The pipeline treats each table as an independent entity. It does not replicate the relationships between tables because the destination warehouse either does not enforce them or actively discourages them.

This means that a database with 15 explicitly defined foreign keys in PostgreSQL becomes a database with zero explicitly defined foreign keys in Snowflake. The data is identical. The documented relationships are gone.

Why Warehouses Skip Constraints

There are three reasons data warehouses do not enforce foreign key constraints, and understanding them helps explain why this is unlikely to change.

Performance. Foreign key checks require lookups on every INSERT, UPDATE, and DELETE. Operational databases handle this at the row level with relatively low overhead. Warehouses load data in bulk, often millions of rows at a time. Checking every foreign key reference during a bulk load would make ingestion dramatically slower.

Flexibility. Warehouses receive data from many sources. An orders table might reference a customers table that has not been loaded yet, or that comes from a different source system entirely. Enforcing referential integrity across these boundaries is impractical.

Philosophy. The warehouse is an analytical system, not an operational one. It stores data for querying, not for transaction processing. The assumption is that data integrity was enforced at the source. The warehouse trusts what it receives.

Snowflake does support foreign key syntax. You can declare them. But Snowflake explicitly states that they are informational only and not enforced. BigQuery and Redshift take similar approaches. The constraint exists as metadata but carries no enforcement.

The Real-World Impact

The loss of foreign key information creates three downstream problems that compound over time.

Analyst trust erodes. An analyst exploring the warehouse schema sees tables with columns that look like they should relate to each other. user_id in the orders table probably references the users table. But "probably" is not certainty. Without explicit foreign keys, every JOIN is an assumption. Experienced analysts develop intuition. New analysts make mistakes.

dbt models inherit the gap. If your dbt project defines relationships in YAML, those definitions are only as complete as the author's knowledge. The relationships that were explicit in PostgreSQL are now implicit knowledge that someone has to remember to document. Some get documented. Many do not.

Semantic layer definitions are incomplete. Tools like dbt MetricFlow and Snowflake Cortex need entity relationships to generate correct JOINs. If a relationship was never declared because it was lost during ETL, the semantic layer has a blind spot. Queries that should traverse that relationship return incomplete or incorrect results.

What You Can Do About It

Document before you migrate. Before running your ETL pipeline, export the foreign key definitions from your source database. A simple query against information_schema gives you every constraint. Save this as a reference document that travels with the data.

Reconstruct in the warehouse. Even though Snowflake does not enforce foreign keys, you can declare them. They serve as documentation for tools and humans that query the schema. Other warehouses have similar informational constraint support.

Automate the discovery. For databases where the constraints were never documented or where the migration happened long ago, automated discovery reconstructs the relationships from schema patterns. Column names, data types, and statistical analysis can identify which columns reference which tables with high confidence.

The foreign keys did not disappear because they were not important. They disappeared because the tools that move data were not designed to carry them. Recognizing this gap is the first step toward closing it.