Why Your ERD is Lying to You

The Diagram Everyone Trusts

Entity-relationship diagrams are one of the oldest tools in database documentation. They show tables as boxes and relationships as lines connecting them. Every database course teaches them. Every architecture review expects them. Most data teams have one pinned in a wiki or generated from a schema visualization tool.

The problem is that ERDs only show what they know about. An ERD generated from explicit foreign key constraints shows the constraints. It does not show the implicit relationships that exist in application code, the cross-database references that cannot be constrained, or the relationships that were dropped during migration. The ERD looks complete. It is not.

What ERDs Show vs What They Leave Out

A typical ERD is generated from one of two sources: manual drawing or automated schema extraction. Manual ERDs show whatever the author knew and remembered to include. Automated ERDs show whatever the database has formally declared.

Both miss the same thing: implicit relationships. A column named customer_id in the orders table that references the customers table but lacks a foreign key constraint is invisible to both approaches. The automated tool cannot detect it because there is no constraint to read. The manual author may not know about it because the relationship exists only in application code.

In our analysis of production databases, the gap averages 3-5x. For every three relationships shown on the ERD, there are nine to fifteen actual relationships in the schema. The ERD is not wrong. It is dramatically incomplete.

How the Gap Grows

Several forces cause the gap between ERDs and reality to widen over time.

ORMs create relationships without constraints. Rails, Django, and SQLAlchemy all define relationships in application code. Whether those relationships become database constraints depends on configuration, version, and developer awareness. Over time, the number of ORM-defined, database-unconstrained relationships grows.

Performance optimization drops constraints. When a table grows to millions of rows and write performance suffers, dropping foreign key constraints is a common optimization. The relationship still exists. The application still enforces it. But the ERD loses it.

Cross-database references are unconstrained by definition. When data moves to a warehouse or when a monolith splits into microservices, relationships that crossed the boundary become implicit. No ERD tool can draw a foreign key that the database system physically cannot represent.

Migration drift accumulates. Database migrations are stressful operations focused on getting data moved safely. Rebuilding constraints is frequently deferred. After enough migrations, the gap between what should be constrained and what is constrained becomes significant.

Consequences of an Incomplete ERD

An incomplete ERD is worse than no ERD in one specific way: it creates false confidence. A team that knows they have no documentation will ask questions and verify assumptions. A team that trusts their ERD will not.

This false confidence leads to several problems:

Analysts miss valid JOINs because the relationship is not shown on the diagram

Data engineers build redundant transformations because they do not realize two tables are related

Semantic layer definitions are incomplete because the known relationships undercount the actual ones

New team members build incorrect mental models based on the incomplete ERD

Generating an Accurate ERD

An accurate ERD requires discovering all relationships, not just the constrained ones. This means combining multiple sources of evidence:

Start with explicit foreign keys from the database schema

Add relationships implied by naming conventions (column names matching table names with standard suffixes)

Validate with type compatibility and cardinality analysis

Score each discovered relationship by confidence

The result is an ERD that shows both explicit and discovered relationships, distinguished visually so users know which ones are formally constrained and which were found through analysis. This honest representation is far more useful than a clean but incomplete diagram.

The ERD on your wiki was accurate when it was created. It has been lying by omission ever since.