Back to Blog
Technical

You Do Not Need to Connect Every Database to Get Value from Squish

Squish Team
February 1, 2026
7 min read

We hear the same objection in almost every security review: "We cannot give a vendor access to our production microservice databases." Fair enough. But here is the thing: you do not need to.

Most of the relationship discovery value Squish delivers comes from your data warehouse alone. You can connect just Snowflake, BigQuery, Databricks, or Redshift and get a comprehensive relationship map without ever touching a production operational database.

The Objection We Hear Most

Security teams are right to push back on connecting production databases to third-party tools. Those databases handle live traffic, contain sensitive data, and any misconfiguration carries real risk. The default answer should be "no" until someone demonstrates clear value.

We agree with that default. Which is why we designed Squish to deliver the majority of its value from your warehouse alone.

Tier 1: Warehouse Only

Connect just your data warehouse. Squish reads information_schema, the same system catalog metadata it reads from any database. No row data, no PII, no application data.

What you get from warehouse-only discovery:

Full relationship map of your analytical layer. Every explicit and implicit relationship between your warehouse tables, scored by confidence. If you have 200 dbt models in Snowflake, Squish finds the relationships between staging models, intermediate models, and marts that nobody documented.

Undocumented join validation. Your analysts write JOINs based on tribal knowledge. Squish confirms which ones are backed by real schema patterns and flags the ones that might be wrong.

Semantic layer bootstrap. Export discovered relationships as dbt MetricFlow YAML, Snowflake Cortex semantic model definitions, or Databricks format. Instead of defining every entity and relationship by hand, start from what actually exists.

A data team with 200+ dbt models connected their Snowflake warehouse to Squish. Discovery found 40 undocumented relationships between staging and mart models in under a minute. Their senior analytics engineer reviewed the results and confirmed 38. Two were false positives from coincidental naming. The other 38 existed only in application code and tribal knowledge.

That is warehouse-only. No production database access required.

Tier 2: Warehouse + Operational Databases

If Tier 1 convinces you, the next step is connecting your operational databases. The PostgreSQL and MySQL instances that run your microservices. Same access pattern: read-only information_schema queries, no row data.

What this unlocks:

Cross-database relationship mapping. A customer_id in your Snowflake warehouse traces back to the users table in your operational PostgreSQL database. Squish maps these cross-database relationships that no single-database tool can find.

ETL lineage validation. When data moves from operational databases to your warehouse through Fivetran, Airbyte, or custom pipelines, relationships can get lost. Foreign keys defined in PostgreSQL become implicit relationships in Snowflake. Squish finds what the ETL pipeline silently dropped.

Full-stack data map. Instead of understanding your warehouse in isolation, you see how data flows from source systems through to analytical models. This is the complete picture that data governance teams need.

A fintech company connected their operational PostgreSQL database and Snowflake warehouse. Discovery showed that 12 foreign keys from PostgreSQL had become implicit relationships in Snowflake after their Fivetran sync. Three relationships were being joined incorrectly in downstream dbt models because an analyst assumed one-to-many when the actual cardinality was many-to-many. They caught a reporting bug that had been inflating numbers for two months.

Tier 3: Statistical Sampling

Tiers 1 and 2 use only information_schema metadata. Tier 3 adds one capability: COUNT and COUNT(DISTINCT) queries on your actual tables. Not SELECT queries. Not row data. Just aggregate counts.

What this unlocks:

Confidence scores on ambiguous matches. When two columns share a name but one has high value overlap with the candidate parent table and the other has minimal overlap, you need to know which is the real relationship. Statistical sampling provides that signal.

Cardinality analysis. Is this relationship one-to-many or many-to-many? The answer changes how you write JOINs and how your semantic layer handles aggregations. COUNT and COUNT(DISTINCT) reveal the cardinality pattern without ever reading the actual values.

Disambiguation of common column names. A column named status_code appears in 8 tables. Metadata analysis alone suggests all 8 might relate to a statuses reference table. Statistical sampling reveals that only 3 have significant value overlap with the statuses table. The other 5 use status_code for different purposes. Without sampling, you get 8 candidates. With sampling, you get 3 confirmed relationships and 5 correctly filtered out.

The statistical queries are minimal:

SELECT COUNT(*), COUNT(DISTINCT status_code) FROM orders;

We count values and count distinct values. We never see the actual values. The database returns two numbers. From those numbers, we calculate cardinality ratios and overlap scores.

What We Never Need

At every tier, Squish never needs:

**No SELECT * on your tables.** We do not read rows. We do not sample data values. We do not access PII, financial records, or any application data.

No write access. We never create tables, insert rows, modify schemas, or execute DDL. The read-only user pattern enforces this at the database permission level.

No stored procedures or function access. We do not read your application logic, business rules, or custom functions.

Even at Tier 3, the most permissive level, we run COUNT and COUNT(DISTINCT). That is the extent of our data-level access. The read-only user you create for Squish can be restricted to exactly these operations, and we provide setup guides for every supported database.

Starting Small

Here is the practical path:

Step 1. Connect your data warehouse. Just one database. Run a 60-second discovery. See what Squish finds in your Snowflake, BigQuery, or Redshift schema.

Step 2. Review the results. Are the discovered relationships real? Are there surprises? Share the results with your analytics team and see if they match the tribal knowledge.

Step 3. If the results are useful, decide whether to expand. Add an operational database. Or enable statistical sampling. Each step is incremental and reversible.

There is no commitment required at any stage. You do not need to connect every database on day one. You do not need to grant statistical query access to get value. Start with your warehouse. See what it finds. Expand when and if you are ready.

Security teams are right to be cautious about production database access. But read-only information_schema access to a data warehouse carries minimal risk. That alone is enough to discover relationships that data teams have been mapping manually for months.

data warehousedatabase securityrelationship discoverySnowflakeBigQuerymetadata accesstiered access

Ready to discover your database relationships?

Stop spending weeks manually mapping relationships. Squish discovers them in 60 seconds with 95%+ accuracy.