Key takeaways
- Start with the execution plan. Guessing wastes time.
- Indexes accelerate reads but slow writes. Add them deliberately, not defensively.
- Filtering early in the query tree usually beats tuning the join algorithm.
- Stale statistics cause more bad plans than bad SQL does.
- Cloud data warehouses optimize differently from transactional databases.
- The cost target is bytes scanned, not just latency.
Most slow query problems aren't actually about SQL syntax. They come down to three things: the optimizer building a bad plan, the storage engine reading more data than it should, or both. Once you treat optimization as a small set of repeatable diagnostic steps rather than a grab bag of tricks, the work stops feeling mysterious. This guide covers the changes that actually move query times from minutes to seconds across both traditional databases and cloud data platforms. The framing is engineer-first: what to check, what to fix, and what to leave alone, with the caveats that come up once you've spent enough time staring at slow queries in production.
What is SQL optimization?
SQL optimization is the practice of making queries return correct results using less time and fewer resources. The work happens at three levels: the query itself (how it's written), the database structure (indexes, partitioning, statistics), and the engine configuration (memory, parallelism, caching). Most teams jump straight to rewriting queries; the higher-impact work is usually at the structural level.
Think of the database optimizer as a route planner. Give it a destination (your query), and it picks a path through the data. Most slow queries aren't the fault of the planner; they're the fault of the inputs the planner has to work with: bad indexes, stale maps, or an address written in the wrong format. The optimizer's job is hard, and it does it well most of the time. When it fails, the cause is usually upstream of the SQL itself.
Rule of thumb
If you're tuning a query before reading the execution plan, you're guessing. The plan tells you whether the optimizer agrees with what you think the query should do, and that's where most diagnoses start.
The work spans both transactional databases (PostgreSQL, MySQL, SQL Server, Oracle, collectively called RDBMS for relational database management systems) and modern cloud data warehouses (Snowflake, Databricks, BigQuery), which use different mechanics but share the same underlying problem: the engine reads more data than the answer requires. The fixes look different on each, but the diagnostic mindset is the same.
Why SQL performance matters
There are three reasons, roughly in the order most teams encounter them.
First, user experience. A dashboard that loads in 30 seconds instead of 3 reshapes how teams use it. Slow queries don't just frustrate users; they cause people to stop checking dashboards, stop running ad-hoc analysis, and stop trusting the data layer entirely. The cost shows up as analyst hours spent waiting and decisions made without the data.
Second, infrastructure cost. On consumption-priced cloud platforms, every unnecessary table scan is a charge on the invoice. A single inefficient daily query can cost more in a year than a senior engineer's salary. The math is rarely visible until someone runs it: a query running 100 times a day at $0.50 per execution costs $18,000 a year, and that's before retries and concurrent runs.
Third, reliability. Fast queries finish before they hit timeouts, exhaust connection pools, or contend with other workloads. Slow queries are operationally expensive in ways that don't show up on the bill. They cause incidents, retries, and on-call pages. Optimizing the slowest 5% of queries usually eliminates a disproportionate share of the team's reliability problems.
The unit-economics argument
A query that runs 100 times per day on a Snowflake medium warehouse can easily cost $100-$300 per day in credits. If the same query, optimized, runs in a quarter of the time, the saved credits over a year fund the engineering time spent optimizing it many times over.
Read the execution plan first
With the why settled, the where to start is always the same: ask the database what it intended to do. Every modern database exposes the optimizer's plan. PostgreSQL has EXPLAIN ANALYZE, MySQL has EXPLAIN, SQL Server has SET STATISTICS and graphical query plans, Snowflake has Query Profile, and BigQuery has the execution graph. Read all of them for the same patterns: full table scans on large tables, nested loops over millions of rows, hash joins spilling to disk, and estimated row counts that diverge from actual row counts by orders of magnitude.
A 10x divergence between estimated and actual rows usually means stale statistics, meaning the database's internal estimates of table size and value distribution have fallen out of sync with reality. This is the route planner working from an old map. Run ANALYZE (or the platform equivalent) and re-check. If the plan still looks wrong, the optimizer is missing context: a missing predicate, an expression that prevents index use, or a function wrapping an indexed column.
What to look for, in order
Start at the leaves of the plan tree, not the top. The deepest operations are usually where the cost is born. A sequential scan on a 200-million-row table will dominate everything above it, no matter how clever the join algorithm at the top of the plan looks. Once you find that bottom-up cost driver, the question becomes whether it's avoidable: is there an index that would make this an index scan, can a predicate be pushed down to filter earlier, or is this query genuinely asking for all the data?
When EXPLAIN lies
EXPLAIN in SQL shows the database’s execution plan for a query, helping understand and optimize performance, indexing, joins, and cost. EXPLAIN without actual execution is just an estimate. The optimizer might predict 1,000 rows, and the query might return 10 million. EXPLAIN ANALYZE on PostgreSQL and ANALYZE TABLE on MySQL run the query and return real numbers. On production, this is sometimes too expensive; in that case, run the query with LIMIT 0 and EXPLAIN to get the plan without execution, then sample on a staging copy. For deeper context on warehouse-specific plan reading, data warehouse optimization covers the differences across platforms.
Plan inspection tools across major databases
Table: Each platform exposes plan inspection differently. Cloud warehouses (Snowflake, BigQuery) use partition and cluster pruning rather than traditional indexes.
Indexing strategies that actually pay off
If the execution plan points to a table scan that should be an index seek, the next question is which index. Index design is mostly about column ordering, and column ordering is mostly about how queries actually filter.
Cover the predicate, not just the column
The most underused index is the composite index that matches your most common WHERE clause. An index on (customer_id, created_at) serves both WHERE customer_id = ? and WHERE customer_id = ? AND created_at > ? queries. An index on (created_at, customer_id) only handles one of them efficiently. Order columns by how often they appear as equality predicates first, range predicates second.
The phonebook analogy is useful here. A phonebook sorted by last name, then first name, lets you find "Smith" easily, then narrow to "Smith, John" within that group. A phonebook sorted by first name, then last name, forces you to scan the entire "John" section to find "Smith." Composite index ordering follows the same logic.
Covering indexes and INCLUDE columns
A covering index contains every column the query needs, so the database never has to fetch from the heap. PostgreSQL and SQL Server both support INCLUDE columns: the indexed columns drive the lookup, the included columns ride along for free. On a query like SELECT email, status FROM users WHERE customer_id = ?, an index on (customer_id) INCLUDE (email, status) eliminates the heap fetch entirely. The trade-off is index size, which directly hits write performance.
Watch the write cost
Each index adds latency to INSERT, UPDATE, and DELETE. On write-heavy tables, more than three or four indexes usually hurts more than it helps. Drop indexes that haven't been used in a month: PostgreSQL exposes pg_stat_user_indexes, SQL Server has sys.dm_db_index_usage_stats, MySQL has performance_schema.table_io_waits_summary_by_index_usage. All three surface zero-use indexes can be removed safely.
Caution
Never drop an index based on usage statistics from a single day. A weekly batch job, monthly close, or quarterly report may use an index that looks dead in daily traffic. Sample at least four weeks before removing.
Partial and expression indexes
Two underused tools. A partial index covers only rows matching a predicate: CREATE INDEX idx ON orders (customer_id) WHERE status = 'pending' is small, fast, and only useful when queries actually filter on that status. An expression index covers a computed value: CREATE INDEX idx ON users (LOWER(email)) makes WHERE LOWER(email) = ? usable by the optimizer (the technical term is sargable, meaning the predicate can use an index). Both reduce index size compared to indexing the full table or column, which makes the maintenance cost cheaper, too.
Joins: Filter before you join
Indexing solves single-table access; joins are where multi-table queries live or die. The join algorithm matters less than what you feed into the join. A hash join across one million rows usually beats a nested loop across 10,000 rows when the predicate is selective enough. The optimizer gets this right when stats are fresh, and predicates are sargable.
What the optimizer can't fix is a join condition hidden behind a function: JOIN orders ON UPPER(orders.email) = UPPER(users.email) defeats any index on email. Normalize the data once at write time and drop the function from the join.
Hash, merge, and nested loop
Each algorithm wins in different conditions. A nested loop is fastest when one side is tiny, and the other has a good index. Hash join wins on larger unsorted inputs. Merge join wins when both sides are already sorted on the join key, which is rare in practice but easy to spot in a plan. The optimizer usually picks correctly; when it doesn't, the cause is almost always stale statistics or a non-uniform data distribution.
Broadcast vs. shuffle on distributed systems
On Spark, Snowflake, and BigQuery, joins between a small dimension and a large fact table benefit enormously from broadcast joins, where the small table is replicated to every executor. Spark's default broadcast threshold is 10MB, which is often too low for modern hardware; raising it to 100MB or 1GB, where memory allows, can eliminate expensive shuffles entirely. The cost is RAM per executor; the benefit is a join that runs without crossing the network.
When CTEs help and when they hurt
For queries joining six or more tables, check whether materializing an intermediate result helps. Common table expressions (CTEs, the WITH clause) are sometimes inlined by the optimizer and sometimes not. The execution plan tells you which. PostgreSQL 12 and later inlines CTEs by default; older versions materialized them. When CTE inlining hurts, a temporary table with an explicit index gives you control.
The mistakes that cost real money
Once the structural foundations (plans, indexes, joins) are right, the remaining performance problems are usually in the SQL itself. Four mistakes account for most slow query tickets:
- Implicit type conversion: WHERE user_id = '123' against an INT column forces a cast that disables the index. Match literal types to column types.
- Leading wildcards: LIKE '%foo' always scans. For substring search, use full-text or trigram indexes (pg_trgm on PostgreSQL).
- SELECT * across joins: especially expensive on columnar warehouses where every extra column adds bytes scanned. Project only what you need.
- Functions on indexed columns: WHERE DATE(created_at) = '2024-01-01' defeats the index. Rewrite as a range: WHERE created_at >= '2024-01-01' AND created_at < '2024-01-02'.
A pre-commit lint rule that flags SELECT * and leading wildcards catches most of them before they ship. Beyond these four, the most expensive mistake in production is OR conditions across non-indexed columns. The optimizer often can't use any index at all, falling back to a full scan. Splitting into UNION ALL of two indexed queries is uglier but routinely 100x faster on large tables.
Schema decisions that affect every query
Query-level fixes have a ceiling. Past a certain table size or workload mix, the schema itself is the bottleneck.
Normalize for writes, denormalize for reads
The textbook rule is to normalize aggressively, then denormalize where reads dominate. In practice, the right answer is workload-shaped: an OLTP system (online transaction processing, the kind of database that handles user-facing CRUD operations) handling thousands of writes per second needs strict normalization to keep update locks small. An OLAP workload (online analytical processing, the analytics side) reading the same dimensions in every query benefits from a denormalized star schema where joins are cheap or eliminated entirely.
Partition large tables before they hurt
Once a table crosses 100 million rows, partitioning becomes the single largest performance lever available. Range partitioning by date is the most common: each month gets its own physical partition, queries filtering on date prune to a single partition automatically, and old data can be detached and archived without affecting active queries. Adding partitioning to an existing one-billion-row table is painful; the migration itself is costly. Design it early.
Data types matter more than you think
A BIGINT column where INT would suffice doubles the index size for that column. A TEXT column where VARCHAR(50) would suffice changes how the database stores rows. On hot tables, choosing the smallest type that holds your data reliably reduces I/O at every layer of the stack.
Cloud data platforms play by different rules
Everything above applies to traditional databases. Cloud data warehouses share the diagnostic mindset (read the plan, find the bottleneck, reduce data scanned), but the mechanics shift.
Snowflake, Databricks, and BigQuery don't have traditional B-tree indexes. Snowflake uses micro-partition pruning and clustering keys, Databricks Delta uses Z-ordering and bloom filters, and BigQuery uses partitioning and clustering. Tuning these means understanding how the platform stores data, not which index hint to add.
The cost model also shifts. On consumption-priced platforms, the optimization target is bytes scanned, not just latency. A BigQuery query that scans 10TB costs over $60. A Snowflake warehouse left running for an idle hour wastes credits regardless of whether queries are fast or slow. Optimization on these platforms is always a cost-and-latency calculation, not pure latency.
Worked example
A daily aggregation that scans a 5TB partitioned BigQuery table costs about $30 per run. Pruning to the relevant 50GB through a partition filter drops that to $0.30. Same query, same answer, 100x cheaper. Over a year, that one fix saves more than $10,000 on a single pipeline.
Snowflake cost optimization and cloud data cost optimization cover the platform-specific patterns in detail.

