A financial controller at a distribution company needed a weekly report: sales per region, margin per product category, comparison with the previous quarter. They either wrote the query themselves or waited for an analyst. With a text-to-SQL agent deployed, this time was reduced to tens of seconds. Not because the model "knows" the database. Because someone had done the proper work beforehand: described the schema, defined the business glossary, and set up guardrails that prevent the agent from doing anything destructive.
This is the real promise of text-to-SQL. That’s it. Nothing more.
How a text-to-SQL agent works: schema-aware prompting and tool loop
#A text-to-SQL agent is an agent with access to a database tool. Its workflow is as follows: the user asks a question in natural language, the agent builds context (database schema, business glossary, sample column values), generates an SQL query, validates it before execution, sends it to the database, retrieves the result, and formulates an answer along with the displayed query.
The key element is schema-aware prompting. With each question, the model’s context includes a schema description: table names, columns, data types, foreign keys, and column-to-business-term mappings. Without this mapping, the model doesn’t know that the net_rev_adj column means "adjusted net revenue" and that values are in thousands of PLN. Misinterpretation of units or names looks like a model error but is actually a configuration error.
The agent uses tool-use: instead of generating a query and executing it immediately, it calls the sql_query(statement, limit) tool. This tool has a hard row limit (typically 500-2000 depending on context), operates exclusively on a read-only connection, and logs every call with the full query text and timestamp.
For complex multi-step questions, the agent breaks the task into a sequence of queries: first retrieves aggregated data, then filters or joins it. This is the same ReAct loop as in a multi-step agent, only the tool is an SQL database instead of an API.
Guardrails that make it safe
#The text-to-SQL architecture alone, without guardrails, is a tool that can cause harm. Below, I describe four layers of protection we implement at Cashcrown for every deployment.
Layer 1: Read-only role at the database level. The agent’s connection to the database uses a role without INSERT, UPDATE, DELETE, or DROP permissions. Even if the model generates a data-modifying query (e.g., via prompt injection), the database rejects it with a permissions error. This is the only line of defense that works independently of all others.
Layer 2: Allowlist of tables and columns. The agent sees only those tables and columns explicitly permitted. Tables containing personal data (names, emails, customer IDs), HR data, or financially sensitive data are excluded from the schema provided to the model by default. Access to them requires a human decision, not a configuration change by the user.
Layer 3: Query validation before execution. Before each execution, a guardrail parses the generated SQL query and checks: whether it contains only SELECT statements, whether it references tables outside the allowlist, and whether the row limit is below the threshold. A query that fails validation goes to the log and escalation, not to the database.
Layer 4: Human-gate for low confidence and sensitive tables. When the model assesses the answer’s confidence as low (complex JOINs, ambiguous questions, unknown terms), the agent signals instead of answering: "This query requires human review." The same applies to any query touching tables marked as sensitive. Human-oversight is not optional for these two classes.
| Guardrail Layer | What It Checks | What Happens on Violation |
|---|---|---|
| Read-only role (database) | Permissions at the DB level | Permissions error, query rejected |
| Table allowlist | Whether the table is in the permitted set | Validation error, escalation to log |
| Query parser | SELECT-only, no DDL/DML | Blocked before sending to DB |
| Human-gate | Low confidence or sensitive table | Pause, request for operator approval |
Where text-to-SQL agents fail: realistic assessment
#The precision of text-to-SQL systems in production environments ranges from 70 to 85% on clean, well-documented schemas. On messy schemas (abbreviated column names, missing foreign keys, ambiguous types), it drops to 40-60%. At Cashcrown, we measure this on a golden set of 50-100 questions before every production deployment.
Three error categories appear most frequently.
Hallucinated columns and tables. The model may generate a query referencing a column that doesn’t exist in the schema, especially when the schema is large or incompletely described. The SQL-parsing guardrail catches this error before it reaches the database. Without this validator, the query triggers a database error, which the model may handle poorly.
Incorrect JOINs on complex relationships. A question requiring joins across five tables with partially described keys often results in a syntactically correct but semantically wrong query. It aggregates at the wrong granularity or omits a filter condition. The result looks plausible but is incorrect. The safeguard is forcing the agent to always show the generated query to the user before providing an answer.
Ambiguity in business terms. "Revenue," "margin," "customer" can have different definitions across departments in the same company. The business glossary in the semantic layer must unambiguously map these terms to specific columns and formulas. Without this mapping, errors are hard to detect because the numbers look reasonable. Details on preparing data for AI are covered in the article AI for Data Analysis and BI.
Hard limits: what the agent never does autonomously
#This isn’t about configuration to loosen later. It’s an architectural boundary.
A text-to-SQL agent never autonomously executes INSERT, UPDATE, DELETE, or DDL queries. Even if the user requests data modification via conversation. Even if the prompt looks harmless. The read-only role at the database level is the only defense independent of the model.
The agent never accesses tables containing PII (personal data of customers, employees, contractors) without explicit human approval. These tables are excluded from the schema provided to the model. Access to them is only possible via a human-gate path, where the operator sees what data will be retrieved and approves or rejects the query. Integration of the agent with corporate systems like ERP is covered separately in the article AI Integration with ERP and Systems.
Queries with low model confidence (ambiguous questions, no schema match, empty result when non-empty expected) go to a human with context: the displayed query, reason for escalation, and a suggestion to simplify the question. Broader agent security is covered in the article AI Agent Security.
Observability and result validation
#Observability of a text-to-SQL agent isn’t logging for logging’s sake. It’s the foundation of trust in results and a requirement under the AI Act for systems influencing business decisions.
Every database tool call generates a record containing: the user’s question, the generated SQL query, the number of rows returned, execution time, and guardrail validation results. These records serve three purposes.
First is quality monitoring: what percentage of queries end in success, escalation, or guardrail errors. An increase in escalations signals drift (schema changes, new terms in user questions without glossary updates).
Second is the golden set test: a set of 50-100 questions with expected results run cyclically (weekly or with every schema change). A drop in precision below the threshold alerts before users report errors.
Third is the audit trail required by the AI Act and RODO for systems influencing financial or HR decisions. Every answer must be reproducible: it’s known which query generated the result, from what data, and at what time. Output validation is detailed in the article LLM Output Validation.
Try it live
#Describe your database schema and the analytical question you want to ask, and the model will design a text-to-SQL agent architecture: schema prompting, guardrails, and human-gate points for your scope (playground: PII masked, zero retention):
FAQ
#What is the real precision of a text-to-SQL agent on a production database?
#On a clean, well-documented schema with a defined business glossary, precision ranges from 70 to 85% without additional fine-tuning. On schemas with ambiguous column names, missing foreign keys, or mixed units, precision drops to 40-60% and requires cleanup work first. At Cashcrown, we measure precision on a golden set of 50-100 questions before every production deployment. Without this baseline, you don’t know where you’re starting or how to measure improvement.
Can the agent accidentally modify data in the database?
#No, if the architecture is built correctly. The agent’s connection uses a database role without INSERT, UPDATE, DELETE, or DDL permissions. This layer works independently of the model: even if the model generates a data-modifying query via prompt injection or error, the database rejects it with a permissions error. Additionally, the SQL-parsing guardrail blocks any query containing non-SELECT statements before sending it to the database.
What happens when the agent doesn’t know the answer or is uncertain?
#The agent signals escalation instead of answering: it shows the generated query, states the reason for uncertainty (e.g., unknown term, no schema match, empty result), and asks the operator to verify or simplify the question. This behavior is designed, not a failure. A system that confidently answers questions outside its scope is more dangerous than one that admits ignorance. The structured output pattern with confidence and requires_review fields allows the agent to formally signal this decision.
How many tables can the agent handle in a single call?
#The practical limit is 30-50 tables in the schema context with typical context window sizes of modern models. For larger databases, layering is used: the agent operates on views prepared by a data engineer, not the raw schema. Alternatively, the schema is dynamically filtered based on the question, limiting context to tables relevant to that query. Both techniques require additional configuration.
What does deploying a text-to-SQL agent with Cashcrown look like?
#We start with a schema and business glossary audit (1-2 weeks depending on the number of tables and documentation state). Then we configure guardrails, table allowlists, and the read-only role. Next is the golden set: 50-100 questions with expected results, baseline precision measurement. A pilot with users for 2-4 weeks with full escalation monitoring. Full autonomy only after validating precision and escalation levels below agreed thresholds. Get started with an assessment of readiness or contact us.