Key takeaways
- Cortex AI Functions, Cortex Search, Cortex Analyst, and Cortex Agents each bill through separate Account Usage views that warehouse cost dashboards do not roll up by default.
- Agent workflows multiply LLM calls per user request. A single question can trigger 5 to 15 underlying calls.
- Re-embedding entire tables on every change is the single most expensive Cortex Search anti-pattern.
- Setting hard iteration limits and tool call caps is the cheapest defense against runaway agent loops.
- As of April 1, 2026, Snowflake bills AI features in a separate currency called AI Credits at a flat $2.00 (Global)/$2.20 (Regional) per credit, decoupled from your Snowflake Edition price.
The bill comes in. Snowflake usage is up 40% over the prior month. Compute hours look normal. Storage hasn't moved. You dig in, and the cost is concentrated in a few line items that didn't even exist on the invoice last quarter: Cortex Search serving, Cortex Analyst requests, and AI_COMPLETE token usage. Someone shipped an agentic feature in production. It works. It also costs five times what the team estimated. This is the modern Snowflake cost problem in a nutshell.
Agentic AI is genuinely useful, but the cost surface area is wider than most teams realize, and the failure modes are non-obvious. This guide covers what's actually changing on your invoice when you turn on agentic AI in Snowflake, where the hidden multipliers come from, and what to do about them before someone in finance asks why the bill doubled.
What is agentic AI in Snowflake?
Agentic AI is a category of AI applications where a language model doesn't just answer a question; it reasons through what steps to take to answer the question. It calls tools, reads results, decides what to do next, and loops until it has an answer it considers good enough. In Snowflake, this shows up as Cortex Agents and Snowflake Intelligence, both of which became generally available in late 2025. They orchestrate calls between an LLM (large language model, the AI doing the reasoning), Cortex Search (Snowflake's vector retrieval service), Cortex Analyst (Snowflake's text-to-SQL service), and the warehouse itself.
Picture the difference this way: a regular SQL query is a contractor with a blueprint. They show up, do exactly what the blueprint specifies, and leave. An agent is a contractor without a complete blueprint. They show up, look at the situation, decide on the next step, do it, look at the result, and decide on the next step. Sometimes that produces better results. It also takes more time, more materials, and is harder to budget for in advance.
Why this matters for your bill
A traditional Snowflake query is one unit of work. An agent invocation is many units of work, often unpredictable in number. The cost surface expands with each tool call, each LLM round trip, and each retry on a failed sub-step.
Why did my Snowflake bill spike after enabling agentic AI?
Compute surge from AI-driven query expansion
Three things happen at once. First, Cortex functions hit a separate, token-priced billing line that's distinct from warehouse compute. Calling AI_COMPLETE on a million rows isn't measured in warehouse credits. It's measured in tokens, billed in AI Credits, and a typical model run on a few thousand tokens of context per row adds up fast.
Concurrency multiplier from parallel agent tasks
Agent workflows aren't single queries. A user asks one question; the agent decomposes it into sub-tasks, calls a search tool, calls a SQL generation tool, runs the SQL, evaluates the result, and possibly retries. One human question becomes 5–15 underlying LLM calls and a handful of warehouse queries. If your team estimated cost based on "calls per day" without modeling the agent multiplier, your forecast was off by an order of magnitude from day one.
Hidden credit burn from iterative AI workflows
Parallel agent runs compete for the same compute. Cortex Search serving compute scales out under load. If your Slackbot, your dashboard, and your analyst tool are all using Cortex Agents simultaneously, you're paying for serving compute on every concurrent path.
A probable scenario
A team rolled out a Cortex Analyst-backed dashboard for sales reps. In testing, it costs about $200 per week. In production with 80 daily users, it cost over $4,000 the first week. The math wasn't subtle in retrospect: each user asked about three questions per day, each question triggered an agent chain, each chain involved multiple Cortex Analyst requests when SQL generation failed on the first try. The product worked beautifully; the unit economics didn't.
Snowflake cost anomalies access covers more on auditing this kind of usage spike.
How agentic AI in Snowflake works under the hood
Core components: Models, orchestration, and data layers
Snowflake Intelligence and Cortex Agents share a similar architecture: an LLM that orchestrates calls to tools, which include Cortex Search (vector retrieval), Cortex Analyst (text-to-SQL), and direct warehouse query execution. The orchestration layer drives the cost. The orchestrator is itself an LLM call: every step the agent takes is a model invocation that takes the conversation history and tool outputs as input and produces the next action. As the conversation grows, prompts grow, and per-call token costs scale. Snowflake's Cortex AI Functions overview lists the full set of AI_* functions (AI_COMPLETE, AI_EMBED, AI_EXTRACT, AI_CLASSIFY, AI_AGG, and more) that agents call under the hood.
How agent workflows translate into query execution
The tool layer is where physical compute happens. A Cortex Search call against an indexed corpus consumes serving compute (priced per gigabyte-month of indexed data at one-second resolution) plus embedding token costs on the source side. Cortex Analyst calls go to Snowflake's text-to-SQL service and then execute against your warehouse. A single Analyst call, therefore, generates two distinct charges: the Analyst request and the warehouse credits to run the SQL it produced.
Where AI actions map to Snowflake credit usage
The data layer is what changes over time. Embeddings (numerical representations of text used for vector search) live in vector columns, and re-indexing happens when source data changes. If you've configured Cortex Search to refresh on every upstream load, you're re-embedding rows that haven't changed, paying for embedding compute proportional to the corpus, not the delta. On a 10M-row table updated daily, that's an order of magnitude more cost than incremental refresh. The mapping to credits is non-obvious because Snowflake reports AI usage in separate views under SNOWFLAKE.ACCOUNT_USAGE. Standard warehouse cost dashboards miss them entirely.
Recent pricing change worth knowing
As of April 1, 2026, Snowflake introduced AI Credits as a separate billing currency for covered AI features, flat-priced at $2.00 (Global)/$2.20 (Regional) per credit and decoupled from Snowflake Edition pricing. Teams on premium editions saw their AI bill drop overnight; teams on Standard see no change. Reporting still flows through the Account Usage views below, but the credit-to-dollar math is now uniform across editions.
Cortex billing surfaces by service
Hidden cost drivers: Storage, compute, and AI service fees
Storage growth from intermediate AI outputs and logs
Storage is the smallest of the three drivers, but the easiest to overlook. Vector embeddings on a wide table can add 30–50% to the table's stored size. Conversation logs and tool call traces (which most teams keep for debugging and audit) accumulate quickly. A team retaining full agent transcripts for 90 days on a chatbot with 1,000 daily conversations stores a few hundred GB of trace data. That's small in isolation but adds up across multiple agent applications.
Compute costs from repeated model inference cycles
Compute is the dominant driver, and within compute, repeated inference is the most expensive failure mode. It takes a few specific shapes. The most common is an agent loop without a hard iteration cap that retries when a tool returns an unexpected result; without a limit, an agent that can't find the right answer will burn tokens trying. The second is batch generation patterns, where every row triggers an independent inference call; batching the same workload into a smaller number of multi-row prompts can cut costs by 5 to 10x. The third is over-aggressive re-embedding of vector corpora on every upstream change.
Additional AI service charges and pricing tiers
AI service fees are the line items most teams discover first because they appear under names that don't match warehouse usage. Cortex Search bills serving compute by the gigabyte-month of indexed data, metered at one-second resolution, while the index is active. Cortex Analyst charges per request, with separate rates for different tiers. Document AI charges per page processed. Each one needs to be tracked separately, because rolled-up Snowflake spend doesn't break them out by default. The AUTO_SUSPEND property on Cortex Search Services is the simplest way to stop paying for serving compute during predictable idle periods.
Can you estimate Snowflake AI services' cost before you deploy?
Building a cost model based on workload patterns
Yes, but with caveats. The estimation approach that works in practice has three inputs: expected user volume, average tokens per user interaction, and the agent multiplier (how many underlying calls a typical user request generates). Token estimation is the easy part: instrument your dev environment, run a few hundred representative queries, and pull token counts from CORTEX_AISQL_USAGE_HISTORY. Multiply by your model's per-token rate from the Snowflake Service Consumption Table. This gets you a per-interaction cost figure within 20% accuracy.
Using query history and benchmarks for estimation
The agent multiplier is harder. A user-facing question typically produces 3 to 5x its base prompt cost in agent overhead. A complex analytical question that requires the agent to call Cortex Search, then Cortex Analyst, then re-evaluate, then run a follow-up query can hit 10x. The most reliable way to model this is to instrument actual agent runs through CORTEX_AGENT_USAGE_HISTORY and measure end-to-end credits per request, including the tool calls each request triggered.
Limitations of pre-deployment cost forecasting
The workload pattern is where forecasts diverge from reality most often. The same agent with 10 users in beta and 1,000 users in production behaves differently because failure modes scale. Agent retry loops that happen in 5% of beta interactions might happen in 15% of production interactions when users ask questions the agent wasn't trained for. The honest forecast number is therefore your dev measurement times user volume times 1.5 to 3x for production failure modes. Anything tighter than that range is overconfident. Snowflake cost optimization covers more on baseline cost modeling techniques, and the Snowflake cost savings calculator is a useful sanity check on the baseline math before you start.
What strategies cut Snowflake AI cost without hurting performance?
Cap agent iterations
Set a hard max_iterations limit on every agent. Five iterations handle 95% of legitimate questions; ten cover nearly all of them; an unbounded loop can run for 50 or more steps when the agent is confused. The cap is a one-line config change that often cuts costs by 20 to 30% on production agents.
The intern rule
An agent without iteration caps is the over-eager intern who keeps coming back to ask one more clarifying question. Helpful in moderation, expensive in excess. Cap it the same way you would manage any other unbounded loop.
Switch to incremental embedding refresh
On Cortex Search, configure incremental refresh rather than full reindexing. Source tables almost always update a small percentage of rows per day. Re-embedding only the changed rows cuts indexing cost proportional to the change rate, often 90% or more savings on slowly-updating corpora.
Batch where possible
For batch workloads (classification, summarization, extraction), aggregate rows into multi-row prompts where the model can handle it. Going from 1,000 single-row calls to 100 ten-row calls reduces the per-call overhead and cuts costs meaningfully. This only works for tasks the model can handle in batch; agent-style interactions don't batch.
Cache deterministic outputs
If your application asks the same question repeatedly (a Slackbot answering common queries), cache responses keyed on normalized input. A cache hit rate of 30% on a high-volume agent application directly translates to 30% cost reduction. For Anthropic and OpenAI models on Cortex, prompt caching applies automatically at the model layer once context length crosses certain thresholds, which compounds the application-layer cache savings.
Right-size the model
Cortex offers multiple models at different price points. Use the smallest model that meets the accuracy bar for the specific task. Agent orchestrators benefit from larger models because they reason about tool calls; individual classification tasks rarely need the same horsepower. The default of using a flagship model for everything is a common cost mistake; tier the model selection by task complexity. Snowflake's Managing Cortex AI Function costs with Account Usage guide also walks through how to wire spending limits and notification alerts on these views.
Continuous visibility and spend control
Tracking agentic AI spend at the level of granularity needed to act on it requires monitoring more than the warehouse cost dashboard. CORTEX_AISQL_USAGE_HISTORY, CORTEX_SEARCH_DAILY_USAGE_HISTORY, CORTEX_ANALYST_USAGE_HISTORY, and the newer CORTEX_AGENT_USAGE_HISTORY need to be aggregated, attributed to applications and users, and watched for anomalies. A 50% week-over-week jump in agent token usage is usually catchable in real time, but only if someone is looking. AI agentic observability covers what continuous monitoring on Cortex usage looks like in practice across all four billing surfaces.



.avif)
