The Semantic Layer, Explained for People Who Actually Build Data Pipelines

Every data vendor has a semantic layer pitch now. It involves diagrams with arrows, the phrase "single source of truth," and promises of self-serve analytics. Here is what a semantic layer actually is in practical terms for data engineers.

What It Is

A semantic layer is a mapping between your physical database schema and the business concepts your organization cares about. That is it.

Your database has tables like dim_customers, fact_orders, and stg_payments. Your CFO thinks in terms of "revenue," "churn rate," and "customer lifetime value." The semantic layer bridges that gap by defining how business concepts map to SQL.

In concrete terms, it is a configuration file (usually YAML) that says things like:

"Revenue" means SUM(fact_orders.amount) grouped by order_date

"Active customer" means a customer with at least one order in the last 90 days

"Churn rate" means the percentage of customers who were active last quarter but not this quarter

The semantic layer tool then generates the appropriate SQL for whatever downstream tool is asking the question.

What It Is Not

It is not a replacement for your data warehouse. It does not store data. It does not run transformations. It sits between your warehouse and your BI tools, translating business questions into SQL.

It is also not a silver bullet for self-serve analytics. Business users still need to understand what questions they can ask and how to interpret the answers. The semantic layer makes the SQL part consistent, not the thinking part.

When It Actually Helps

Multiple BI Tools Hitting the Same Warehouse

If your company uses Looker for dashboards, Hex for ad-hoc analysis, and a Python notebook environment for data science, each tool has its own way of defining metrics. "Revenue" in Looker might be calculated differently than "revenue" in a notebook. A semantic layer centralizes the definition so all three tools compute the same number.

Metric Definitions That Keep Drifting

When three analysts define "monthly active users" three different ways, you have a semantic layer problem whether you know it or not. Centralizing the definition and forcing all queries through it eliminates drift.

AI Agents Querying Your Data

This is the newer and arguably more compelling use case. When an LLM-based agent needs to answer "what was revenue last quarter," it needs to know which table to query, how to calculate the metric, and what filters to apply. A semantic layer provides exactly this mapping in a format that AI agents can consume programmatically.

When It Is Overkill

If you have one BI tool and a small data team that communicates well, a semantic layer adds complexity without proportional benefit. Your dbt models already serve as a de facto semantic layer. The definitions live in your transformation logic, and everyone on the team knows where to find them.

If your metric definitions are stable and unambiguous, the problem a semantic layer solves does not exist for you yet.

The Practical Challenge

The hardest part of adopting a semantic layer is not the technology. It is the organizational work of agreeing on metric definitions. Defining "revenue" requires getting sales, finance, and product to agree on what counts. Is it gross or net? Does it include refunds? What about free trials that convert?

No tool solves this for you. The semantic layer gives you a place to encode the answer once the humans have worked it out.

Where Squish Fits In

Squish sits upstream of the semantic layer. Before you can define "revenue" as SUM(fact_orders.amount), you need to know that fact_orders exists, that it relates to dim_customers through user_id, and that amount is the right column.

Squish discovers these relationships automatically and can export the result as a semantic model definition for dbt MetricFlow, Snowflake Cortex, or Databricks. Think of it as the step that generates the first draft of your semantic layer based on what actually exists in your database, instead of making you define every entity and relationship by hand.

Choosing a Semantic Layer Tool

The three main options as of early 2026:

dbt MetricFlow is the natural choice if you already use dbt for transformations. Metrics are defined in YAML alongside your models. The integration is tight, and the ecosystem is mature.

Snowflake Cortex Analyst is tightly integrated with Snowflake and particularly strong for AI agent use cases. If your warehouse is Snowflake and you want LLMs to query your data, this is the most direct path.

Databricks Genie is Databricks' answer. If you are on the Databricks platform and use Unity Catalog, it fits naturally into that ecosystem.

All three are solid. The right choice depends more on your existing data stack than on feature comparisons. Pick the one that matches where your data already lives.

Getting Started

If you are considering a semantic layer, start small:

Pick your five most contentious metrics. The ones where different teams report different numbers.

Get stakeholders to agree on definitions. This is the hard part.

Encode those definitions in whichever tool matches your stack.

Route your most-used dashboards through the semantic layer.

Expand coverage as you build confidence in the approach.

Start small. Five well-defined metrics that everyone trusts are worth more than 500 that nobody has validated.