How to Discover Hidden Relationships in Your Database
Database relationships are the foundation of data integrity and efficient querying. In most organizations, undocumented or implicit relationships exist beneath the surface, causing data quality issues and making it harder to build reliable data models.
The Hidden Relationship Problem
Most databases have more relationships than what is formally defined through foreign key constraints. These "hidden" relationships arise from:
1. Legacy Schema Evolution
As databases evolve over time, relationships get added informally. A column named user_id in a transactions table might reference the users table, but without an explicit foreign key constraint, this relationship exists only in application code or developer knowledge.
2. Cross-Database References
Modern data stacks often involve multiple databases. A Snowflake warehouse might contain customer_id values that reference a PostgreSQL operational database, but these cross-database relationships are impossible to enforce with traditional constraints.
3. Naming Convention Patterns
Columns following patterns like *_id, *_uuid, or *_key often indicate relationships, but these patterns vary by team and project, making systematic discovery challenging.
Traditional Discovery Methods
Manual Code Review
The most common approach is reviewing application code to understand how tables connect. This works but is time-consuming and error-prone, especially when:
Schema Visualization Tools
Traditional ERD tools can visualize explicit foreign keys but miss implicit relationships entirely. This gives an incomplete picture of your data model.
Data Profiling
Sampling data and looking for matching values across columns can reveal relationships, but this approach:
Automated Relationship Discovery
Modern tools like Squish use multi-signal analysis to discover relationships automatically:
Column Name Analysis
Advanced pattern matching identifies relationship candidates based on naming conventions across your specific schema, learning from explicit relationships to find implicit ones.
Data Type Compatibility
Relationship candidates are filtered by data type compatibility. A VARCHAR column is unlikely to reference an INTEGER primary key.
Cardinality Analysis
By analyzing value distributions, automated tools can distinguish between one-to-many, many-to-many, and one-to-one relationships.
Value Overlap Scoring
Statistical analysis of value overlap between columns provides confidence scores for potential relationships, ranking candidates from most to least likely.
Best Practices for Relationship Discovery
1. Start with What You Know
Document explicit foreign keys first. These serve as ground truth for validating automated discovery results.
2. Involve Domain Experts
Technical analysis should be validated by people who understand the business context. A 90% value overlap might indicate a relationship or might be coincidental.
3. Document Everything
Once relationships are discovered, document them. Whether through foreign key constraints, metadata catalogs, or ERD documentation, make implicit knowledge explicit.
4. Automate Ongoing Discovery
Schemas evolve. Set up automated discovery to run periodically, catching new implicit relationships as they emerge.
Getting Started with Squish
Squish automates the entire relationship discovery process. Connect your database, run a 60-second analysis, and get a complete picture of both explicit and implicit relationships with confidence scores.
The result is an interactive ERD that shows all relationships, color-coded by confidence level, exportable to multiple formats for documentation and team collaboration.
Automated discovery provides comprehensive results in minutes, freeing your team to focus on building data products instead of mapping schemas.