Why Your AI Agent Cannot Query Your Database (And How to Fix It)

You have probably seen the demos. Someone types "what was revenue last quarter?" and an AI agent writes the perfect SQL query, runs it, and returns a chart. It looks effortless. Then you try it with your own database and the agent writes a query against a table that does not exist, joins on the wrong column, or defines "revenue" differently than your finance team does.

The demos work because the demo database has ten tables with obvious names. Your production database has 400 tables, inconsistent naming, and business logic that lives in tribal knowledge.

The Problem is Context, Not Intelligence

GPT-4, Claude, and the other large language models are perfectly capable of writing SQL. The syntax is not the issue. The issue is that the model does not know your schema, your naming conventions, your business definitions, or your table relationships.

When you ask "what was revenue last quarter?" the model needs to know:

Which table contains revenue data (is it orders, transactions, invoices, or revenue_summary?)

Which column represents the amount (amount, total, revenue, gross_amount, net_amount?)

What "last quarter" means in your fiscal calendar

Whether to include refunds, credits, or free trials

Which tables to join and on which columns

Without this context, the model is guessing. It might guess right on a simple schema. It will not guess right on yours.

Why Cramming the Schema Into the Prompt Does Not Scale

The first approach everyone tries is dumping the schema into the system prompt. Give the model all table names, column names, and data types, and let it figure out the rest.

This works up to a point. For a database with 20-30 tables, the schema fits comfortably in the context window and the model can usually pick the right tables. Past 100 tables, you start hitting problems:

The context window fills up. A 400-table schema with column details can easily consume 50,000 tokens. That is a significant chunk of the context window, leaving less room for conversation history and reasoning.

The model gets confused. More tables means more ambiguity. When there are three tables with "customer" in the name, the model has to guess which one is appropriate. It guesses wrong often enough to erode trust.

You lose the business logic. A schema dump tells the model that an amount column exists and that it is a DECIMAL(10,2). It does not tell the model that "revenue" means SUM(amount) WHERE status = 'completed' AND refunded = false.

The Semantic Layer Solution

This is where semantic layers actually earn their keep. Instead of dumping raw schema into the prompt, you give the AI agent a semantic model that describes your data in business terms:

"Revenue" is defined as a specific calculation against specific tables with specific filters

"Customer" means a specific entity with a specific primary key

Relationships between entities are explicitly declared

The AI agent queries the semantic layer definition to understand what questions it can answer and how to compute the answers. The model does not need to understand your raw schema. It needs to understand your semantic model, which is much smaller and more precise.

The Bootstrap Problem

Semantic layers solve the "AI cannot query my database" problem. But building a semantic layer requires understanding your schema in the first place. You need to know every table, every relationship, every metric definition before you can encode them.

For a small database, this is a weekend project. For a large one, it is months of work that nobody wants to do manually.

This is the specific problem Squish solves. Connect your database, run discovery, and get a complete relationship map with confidence scores. Export it as a semantic model definition. You get a first draft of your semantic layer in about a minute instead of building it from scratch.

The first draft will not be perfect. You still need to add business logic, name things in business-friendly terms, and validate the relationships. But starting from a complete, accurate schema map is dramatically faster than starting from nothing.

A Practical Path Forward

If you want AI agents that can reliably query your database, here is the sequence that works:

Step 1: Map Your Schema

Run automated discovery against every database the AI agent needs access to. Get a complete picture of tables, columns, relationships, and data types. Squish handles this.

Step 2: Define Your Metrics

Take the ten most common questions people ask about your data. Define the SQL that answers each one precisely. These become your first semantic layer definitions.

Step 3: Choose a Semantic Layer Tool

dbt MetricFlow if you use dbt. Snowflake Cortex if you are on Snowflake. Databricks Genie if you are on Databricks. Encode your metric definitions in the appropriate format.

Step 4: Connect Your AI Agent

Give the agent access to the semantic model definition, not the raw schema. The agent uses the semantic model to understand what metrics exist, how they are defined, and what relationships connect them.

Step 5: Validate and Iterate

Run the agent against known questions with known answers. Compare results. Fix the semantic model where the agent gets it wrong. Add new metrics as demand emerges.

What Good Looks Like

When this works well, the AI agent becomes a genuinely useful interface for data access. Business users ask questions in natural language. The agent translates them to SQL through the semantic layer. The results are consistent because the metric definitions are centralized.

It is not magic. It requires the upfront work of building and maintaining the semantic layer. But the alternative, every person writing their own SQL with their own interpretation of "revenue," is worse in every way.

The technology is ready. The models are good enough at SQL. The semantic layer tools are mature. The bottleneck is the schema knowledge required to build the semantic layer in the first place. Automate that step and the rest falls into place.