Back to Blog
Technical

ORMs and the Hidden Schema Problem

Squish Team
October 8, 2025
8 min read

The Gap Between Application and Database

Object-Relational Mappers exist to let developers think about data in terms of objects instead of tables. A User has many Orders. An Order belongs to a User. These relationships are defined in application code and the ORM translates them into SQL at runtime.

The gap appears when you ask: do these application-level relationships exist in the database schema? The answer depends on which ORM you use, how it was configured, and whether anyone changed the defaults.

How Rails ActiveRecord Handles Relationships

Rails defines relationships in model files:

ruby
1class User < ApplicationRecord
2 has_many :orders
3end
4
5class Order < ApplicationRecord
6 belongs_to :user
7end

These declarations tell Rails how to generate JOINs and eager-load associations. But they do not automatically create foreign key constraints in the database.

Rails migrations create the columns:

ruby
1create_table :orders do |t|
2 t.references :user, null: false
3 t.timestamps
4end

The t.references call creates a user_id column and an index. Whether it also creates a foreign key constraint depends on the version of Rails and whether foreign_key: true was specified. Rails 5+ adds it by default for new projects, but older applications and those upgraded from earlier versions often lack these constraints.

The result is that a Rails application can have dozens of belongs_to and has_many relationships that exist only in Ruby code. The database has the columns but no constraints documenting or enforcing the relationships.

How Django Does It Differently

Django takes a more database-aware approach. When you define a ForeignKey field:

python
1class Order(models.Model):
2 user = models.ForeignKey(User, on_delete=models.CASCADE)

Django's migration system creates both the column and the foreign key constraint by default. The database schema reflects the application model more faithfully than Rails typically does.

However, Django developers can disable constraint creation with db_constraint=False:

python
1class Order(models.Model):
2 user = models.ForeignKey(User, on_delete=models.CASCADE, db_constraint=False)

This is sometimes done for performance reasons or when referencing tables in other databases. When it happens, the relationship exists in Django but not in the database, just like the Rails default case.

SQLAlchemy and the Explicit Choice

SQLAlchemy gives developers the most control and the most rope. Relationships can be defined at the ORM level, the schema level, or both:

python
1class User(Base):
2 __tablename__ = 'users'
3 id = Column(Integer, primary_key=True)
4 orders = relationship("Order", back_populates="user")
5
6class Order(Base):
7 __tablename__ = 'orders'
8 id = Column(Integer, primary_key=True)
9 user_id = Column(Integer, ForeignKey('users.id'))
10 user = relationship("User", back_populates="orders")

The ForeignKey('users.id') in the column definition creates the database constraint. The relationship() declarations create the ORM-level association. You can have one without the other.

In practice, SQLAlchemy developers often define both. But when working with existing databases or legacy schemas, it is common to define only the ORM relationship without the database constraint. The application behaves correctly, but the database schema is incomplete.

What Gets Left Behind

The gap between ORM relationships and database constraints accumulates over time through several mechanisms.

Framework migrations. When a team switches from Rails to Django, or from a monolith to microservices, the new framework rarely audits the database constraints left by the old one. Relationships that existed in the old application code may not be recreated as constraints in the new schema.

Microservice boundaries. When a monolithic application is split into microservices, each service gets its own database. Cross-service relationships that were previously within a single database become cross-database references. Foreign key constraints cannot span databases, so these relationships become implicit.

Test divergence. Test databases are often created with simpler schemas than production. Foreign keys might be skipped in test environments for speed. When tests pass and production deploys succeed, nobody notices that the production schema is also missing constraints that should exist.

Why This Matters for Data Teams

Application developers can afford to ignore the gap because their ORM handles the relationships in code. Data teams cannot. When an analyst or a data engineer queries the database directly, they see only what the schema contains. If a relationship is defined in Rails but not in PostgreSQL, it is invisible to anyone working outside the Rails application.

This means data teams are working with an incomplete picture of the data model. They discover relationships through trial and error, by reading application code, or by asking developers. Each of these methods is slow and unreliable.

The problem gets worse with data warehouses. When an ETL pipeline copies data from the operational database to Snowflake, it copies whatever constraints exist in the source. If the source has no foreign keys because the ORM never created them, the warehouse has even less schema information to work with.

Finding What the ORM Forgot

The solution starts with understanding the gap. Query your database for explicit foreign keys and compare the count against the relationships defined in your ORM models. The difference tells you how many implicit relationships exist.

For Rails: count belongs_to declarations and compare against foreign key constraints in the database. For Django: count ForeignKey fields and check which ones have db_constraint=False. For SQLAlchemy: count relationship() declarations with and without corresponding ForeignKey() column definitions.

Automated discovery tools can find these implicit relationships by analyzing column names, data types, and cardinality patterns. They do not need access to application code because the statistical signals in the schema are strong enough to identify relationships that the ORM created but never constrained.

The ORM did its job for the application. Someone needs to do the same job for the database.

ORMRailsDjangoSQLAlchemyforeign keysdatabase schemaActiveRecord

Ready to discover your database relationships?

Stop spending weeks manually mapping relationships. Squish discovers them in 60 seconds with 95%+ accuracy.