Back to Blog
Technical

How Semantic Layers Work on the Backend

Squish Team
January 29, 2026
8 min read

Semantic layers are marketed as a business-user feature, but they are fundamentally a backend engineering problem. A semantic layer takes a high-level question like "what was revenue last quarter" and generates the correct SQL to answer it. Understanding how that translation works is essential for building one that actually produces trustworthy results.

The Translation Layer

A semantic layer is a compiler. It takes input in one language (business questions or metric definitions) and produces output in another (SQL). The quality of the output depends entirely on the quality of the intermediate representation: the semantic model.

Question
Semantic Layer
Generated SQL

The semantic model defines three things:

  • Entities: the nouns in your data (customers, orders, products)
  • Measures: the numbers you can compute (revenue, count of orders, average deal size)
  • Dimensions: the ways you can slice the numbers (by date, by region, by product category)
  • When a consumer asks "revenue by quarter," the semantic layer looks up the definition of "revenue" (a measure), determines which table it comes from, identifies the time dimension, and generates a GROUP BY query with the correct aggregation.

    Architecture of a Semantic Layer

    If you peel back the marketing diagrams, the actual architecture is straightforward.

    Looker
    Hex
    AI Agent
    Python SDK
    Semantic Layer
    MetricFlow / Cortex / Genie
    Data Warehouse
    Snowflake / BigQuery / Databricks

    Consumers sit at the top -- BI tools, notebooks, AI agents, custom applications. They send queries in whatever high-level format they prefer. Looker sends LookML. An AI agent sends a natural language question that gets parsed into a structured metric request.

    The semantic engine sits in the middle. MetricFlow, Cortex, Genie -- pick your flavor. It takes the structured request, resolves it against the semantic model, and spits out SQL.

    The data warehouse sits at the bottom and has no idea any of this is happening. It receives SQL, runs it, returns rows. The simplicity of this bottom layer is the whole point.

    So the interesting engineering is all in the middle. Here is how two major implementations handle it.

    How MetricFlow Generates SQL

    MetricFlow is the engine behind the dbt Semantic Layer. When you define a metric in dbt, MetricFlow handles the SQL generation.

    A metric definition looks like this:

    yaml
    1semantic_models:
    2 - name: orders
    3 defaults:
    4 agg_time_dimension: order_date
    5 entities:
    6 - name: order_id
    7 type: primary
    8 - name: customer_id
    9 type: foreign
    10 measures:
    11 - name: revenue
    12 agg: sum
    13 expr: amount
    14 - name: order_count
    15 agg: count
    16 expr: order_id
    17 dimensions:
    18 - name: order_date
    19 type: time
    20 - name: status
    21 type: categorical

    When you query revenue grouped by quarter, MetricFlow:

  • Finds the semantic model that defines revenue (orders)
  • Resolves the measure expression (SUM(amount))
  • Identifies the time dimension (order_date) and applies the grain (quarter)
  • Generates the SQL:
  • sql
    1SELECT
    2 DATE_TRUNC('quarter', order_date) AS metric_time__quarter,
    3 SUM(amount) AS revenue
    4FROM analytics.production.fct_orders
    5GROUP BY DATE_TRUNC('quarter', order_date)
    6ORDER BY metric_time__quarter

    The power shows when queries span multiple entities. If you ask for revenue by customer_region, MetricFlow traces the foreign key from orders to customers, generates the JOIN, and includes the region dimension:

    sql
    1SELECT
    2 c.region AS customer__region,
    3 SUM(o.amount) AS revenue
    4FROM analytics.production.fct_orders o
    5JOIN analytics.production.dim_customers c
    6 ON o.customer_id = c.customer_id
    7GROUP BY c.region

    MetricFlow needs to know the relationship between orders and customers to generate that JOIN. This is where the entity definitions and their types (primary, foreign) matter. Miss a relationship and MetricFlow cannot traverse from one entity to another.

    How Cortex Analyst Differs

    Snowflake Cortex Analyst takes a different approach. Instead of a programmatic metric API, it accepts natural language queries and uses an LLM to generate SQL based on a semantic model file.

    The semantic model is a YAML file that describes your tables, but with business-friendly descriptions:

    yaml
    1tables:
    2 - name: fact_orders
    3 description: "All completed customer orders with revenue data"
    4 columns:
    5 - name: amount
    6 description: "Order total in USD after discounts"
    7 - name: order_date
    8 description: "Date the order was placed"
    9 - name: customer_id
    10 description: "References dim_customers"
    11 measures:
    12 - name: total_revenue
    13 expr: "SUM(amount)"
    14 description: "Total revenue in USD"

    When a user asks "what was total revenue last quarter," Cortex:

  • Parses the natural language into an intent
  • Matches "total revenue" to the measure definition
  • Interprets "last quarter" relative to the current date
  • Generates SQL using the table and column definitions
  • The tradeoff is flexibility versus predictability. Users can phrase questions however they want, which is great for adoption. But the same question might generate slightly different SQL on different runs. The results should be equivalent, but "should be" makes some data engineers nervous, and reasonably so.

    What the Semantic Layer Needs From Your Schema

    Despite the architectural differences, MetricFlow and Cortex need the same raw ingredients from your schema:

    Table and column inventory

    Every table, every column, every data type. This is the boring but essential baseline. You cannot query what the semantic layer does not know about, and a missing table is a silent gap that nobody notices until the CFO asks a question that returns no results.

    Relationships between entities

    This one trips people up the most. JOINs are the hardest part of SQL generation. The semantic layer needs to know that customer_id in the orders table references customer_id in the customers table, and whether that is many-to-one or many-to-many. Get it wrong and your aggregations silently double-count.

    Business definitions

    Which column is "revenue"? What filter defines "active customer"? No amount of schema metadata answers these questions. This is the part that requires humans sitting in a room until everyone agrees. The semantic layer just gives you a place to write down the answer.

    Time dimensions

    Nearly every metric query involves time, and time is surprisingly tricky. The semantic layer needs to know which columns are temporal, what grains make sense, and how your fiscal calendar maps to the regular one. A "quarterly revenue" number is wrong if the tool thinks Q1 starts in January but your finance team starts it in February.

    The Metadata Bootstrapping Problem

    Building a semantic model from scratch is tedious. For a database with 100 tables, you need to document every entity, every relationship, every measure, and every dimension. Most teams start enthusiastically and run out of energy around table 30.

    Squish exists to skip that mechanical first phase. Connect your database and it discovers:

  • Every table and column across your connected databases
  • All relationships, including the implicit ones your ORM created without constraints
  • Primary key and foreign key classification
  • Data types that help distinguish measures from dimensions
  • Export the results as a semantic model YAML and you have a first draft that covers the structural metadata. You still need to add business definitions (what "revenue" means, what "active" means), but you skip the mechanical work of mapping tables, columns, and relationships by hand.

    Relationship Discovery Is the Foundation

    The most critical piece of metadata for a semantic layer is the relationship graph. Without it, the semantic engine cannot generate JOINs, which means it cannot answer any question that spans multiple tables. And most interesting business questions span multiple tables.

    Explicit foreign keys cover some relationships. But as we have written about before, production databases typically have 3-5x more implicit relationships than explicit ones. A semantic layer built only on explicit foreign keys has blind spots wherever implicit relationships exist.

    Automated discovery finds both. The semantic layer gets a complete relationship graph, which means it can generate JOINs for every valid entity traversal, not just the ones someone remembered to document.

    Getting Started

    If you are building a semantic layer:

  • Start with automated schema discovery to get the structural metadata
  • Map your ten most-asked business questions to specific tables and calculations
  • Define those as metrics in your chosen tool (MetricFlow, Cortex, or Genie)
  • Test each metric against known-good results from your existing reports
  • Expand coverage as consumers adopt the semantic layer
  • The structural metadata is grunt work. The business definitions are where your team actually adds value. Automate the grunt work.

    semantic layerMetricFlowSnowflake CortexSQL generationdata modelingmetricsdata engineering

    Ready to discover your database relationships?

    Stop spending weeks manually mapping relationships. Squish discovers them in 60 seconds with 95%+ accuracy.