How to Discover Hidden Relationships in Your Database

Database relationships are the foundation of data integrity and efficient querying. In most organizations, undocumented or implicit relationships exist beneath the surface, causing data quality issues and making it harder to build reliable data models.

The Hidden Relationship Problem

Most databases have more relationships than what is formally defined through foreign key constraints. These "hidden" relationships arise from:

1. Legacy Schema Evolution

As databases evolve over time, relationships get added informally. A column named user_id in a transactions table might reference the users table, but without an explicit foreign key constraint, this relationship exists only in application code or developer knowledge.

2. Cross-Database References

Modern data stacks often involve multiple databases. A Snowflake warehouse might contain customer_id values that reference a PostgreSQL operational database, but these cross-database relationships are impossible to enforce with traditional constraints.

3. Naming Convention Patterns

Columns following patterns like *_id, *_uuid, or *_key often indicate relationships, but these patterns vary by team and project, making systematic discovery challenging.

Traditional Discovery Methods

Manual Code Review

The most common approach is reviewing application code to understand how tables connect. This works but is time-consuming and error-prone, especially when:

Multiple applications access the same database

Documentation is outdated

Original developers have moved on

Schema Visualization Tools

Traditional ERD tools can visualize explicit foreign keys but miss implicit relationships entirely. This gives an incomplete picture of your data model.

Data Profiling

Sampling data and looking for matching values across columns can reveal relationships, but this approach:

Is computationally expensive for large tables

May produce false positives (matching values that are coincidental)

Requires significant manual analysis

Automated Relationship Discovery

Modern tools like Squish use multi-signal analysis to discover relationships automatically:

Column Name Analysis

Advanced pattern matching identifies relationship candidates based on naming conventions across your specific schema, learning from explicit relationships to find implicit ones.

Data Type Compatibility

Relationship candidates are filtered by data type compatibility. A VARCHAR column is unlikely to reference an INTEGER primary key.

Cardinality Analysis

By analyzing value distributions, automated tools can distinguish between one-to-many, many-to-many, and one-to-one relationships.

Value Overlap Scoring

Statistical analysis of value overlap between columns provides confidence scores for potential relationships, ranking candidates from most to least likely.

Best Practices for Relationship Discovery

1. Start with What You Know

Document explicit foreign keys first. These serve as ground truth for validating automated discovery results.

2. Involve Domain Experts

Technical analysis should be validated by people who understand the business context. A 90% value overlap might indicate a relationship or might be coincidental.

3. Document Everything

Once relationships are discovered, document them. Whether through foreign key constraints, metadata catalogs, or ERD documentation, make implicit knowledge explicit.

4. Automate Ongoing Discovery

Schemas evolve. Set up automated discovery to run periodically, catching new implicit relationships as they emerge.

Getting Started with Squish

Squish automates the entire relationship discovery process. Connect your database, run a 60-second analysis, and get a complete picture of both explicit and implicit relationships with confidence scores.

The result is an interactive ERD that shows all relationships, color-coded by confidence level, exportable to multiple formats for documentation and team collaboration.

Automated discovery provides comprehensive results in minutes, freeing your team to focus on building data products instead of mapping schemas.