Skip to content
Home » All Posts » Top 7 Proven Ways to Diagnose Slow SQL with AWR and ASH Reports

Top 7 Proven Ways to Diagnose Slow SQL with AWR and ASH Reports

Introduction: Why AWR and ASH Are Essential for Slow SQL Diagnosis

When a critical query suddenly slows down, most teams feel the pain long before they know why. In my own work diagnosing slow SQL with AWR and ASH, I’ve found that guessing at the root cause—indexes, stats, or configuration—usually wastes hours while the database continues to struggle.

Oracle’s Automatic Workload Repository (AWR) and Active Session History (ASH) give me the hard evidence I need. AWR shows how performance changes over time, while ASH lets me zoom in to see what active sessions were actually waiting on: disk I/O, CPU, network, latch contention, or specific wait events such as db file sequential read or latch: cache buffers chains. Instead of blaming the SQL text alone, I can tie each slow statement to precise resource bottlenecks.

In this article I’ll walk through practical ways of diagnosing slow SQL with AWR and ASH: spotting problematic SQL IDs, reading wait event profiles, identifying latch and concurrency issues, and validating that any change you make really improves response time. These are the same techniques I rely on when production systems are on fire and we need answers fast.

1. Use AWR Top SQL to Spot the Real Performance Villains

When I’m diagnosing slow SQL with AWR and ASH, I always start with the AWR Top SQL sections. They quickly tell me which statements are really burning CPU, I/O, or elapsed time, so I don’t waste effort tuning a query that isn’t the main culprit.

In the AWR report, I focus on views like SQL ordered by Elapsed Time, SQL ordered by CPU Time, and SQL ordered by Gets/Reads. I look for SQL IDs that consistently appear near the top across these lists, with high executions, large buffer gets, or large physical reads. In my experience, the worst offenders usually combine high elapsed time with either high CPU or high I/O, which points me toward the right tuning strategy.

Once I’ve identified a suspect SQL ID in Top SQL, I drill into its plan and ASH samples instead of guessing. This simple discipline—letting AWR’s Top SQL guide where I spend time—has saved me from countless wild goose chases. For readers who want a deeper breakdown of each Top SQL section, I recommend checking AWR Report analysis – Ask TOM – Oracle as a companion reference.

-- Quick way I verify a Top SQL suspect directly in the database
SELECT sql_id,
       executions,
       elapsed_time/1000000 AS elapsed_s,
       cpu_time/1000000     AS cpu_s,
       buffer_gets,
       disk_reads
FROM   dba_hist_sqlstat
WHERE  sql_id = '&sql_id'
ORDER  BY snap_id;

2. Read AWR Wait Events Like a Pro

Once I’ve used Top SQL to find the main offenders, my next move in diagnosing slow SQL with AWR and ASH is to study the AWR wait events. This is where I see why sessions are slow: are they stuck on CPU, storage, locks, or latch contention?

In the AWR report, I pay close attention to Top Timed Events and the Wait Classes breakdown. If most time is in DB CPU, I know I’m dealing with CPU-bound SQL or poor plans. If I see waits like db file sequential read or db file scattered read, I’m thinking about index access, full scans, and storage performance. High enq: TX or enq: TX – row lock contention tells me concurrency and locking are the main story, while latch: cache buffers chains or similar latch events point to hot blocks and contention in the buffer cache.

Over time I’ve learned to treat each wait class as a quick classification tool: User I/O for storage, Concurrency for locks and latches, Cluster for RAC-related issues, and CPU when DB CPU dominates. That classification then guides my next step—whether to drill into ASH for blocking sessions, check execution plans for bad index usage, or review RAC configuration. For a structured reference mapping each common wait to likely root causes, Oracle Wait Events – Oracle Database Documentation can be very handy alongside your AWR reports.

3. Drill into ASH Top Wait Events for Real-Time Slow SQL Clues

AWR is great for the big picture, but when I’m diagnosing slow SQL with AWR and ASH under real production pressure, ASH is what lets me zoom in to the exact few minutes when users were shouting. ASH samples active sessions every second, so I can see which SQL was actually waiting, on what, and for how long.

My usual approach is to focus ASH on a narrow time window around the incident and group by wait event, SQL_ID, and sometimes session. That immediately reveals whether the slowdown was driven by I/O waits, CPU pressure, locks, or latch contention, and which SQL IDs were involved. One thing I learned the hard way was that system-wide AWR averages often hide very short, very painful spikes that only show up clearly in ASH.

Here’s a simple pattern I use directly in the database to mimic an ASH “Top Wait Events” view for a problem window:

-- ASH-style Top Wait Events for a short incident window
SELECT   wait_class,
         NVL(event, 'ON CPU') AS event,
         COUNT(*)             AS samples,
         ROUND( 100 * RATIO_TO_REPORT(COUNT(*))
                OVER (), 1 )  AS pct_active
FROM     v$active_session_history
WHERE    sample_time BETWEEN TO_TIMESTAMP('&start_ts', 'YYYY-MM-DD HH24:MI:SS')
                     AND TO_TIMESTAMP('&end_ts',   'YYYY-MM-DD HH24:MI:SS')
GROUP BY wait_class, NVL(event, 'ON CPU')
ORDER BY samples DESC;

Once I see which events dominate, I’ll rerun a similar query grouped by SQL_ID within that event. That tight ASH focus has often been the difference between guessing and confidently saying, “this specific SQL, at this exact time, was blocked on this wait.”

4. Spot and Fix Latch Contention from AWR and ASH

Out of all the issues I run into when diagnosing slow SQL with AWR and ASH, latch contention is one of the trickiest because it can make the whole database feel slow without any single query looking huge. Latches protect shared memory structures; when they get hot, sessions pile up waiting to grab them, and your response times collapse.

The first hint usually comes from AWR. In the Top Timed Events or Wait Events by Class sections, I look for waits in the Concurrency class, especially events like latch: cache buffers chains, latch: shared pool, or library cache: mutex X. When I see these high in the list, I know I’m not just dealing with slow I/O or CPU-bound SQL; I’m dealing with contention around shared memory structures such as hot blocks or library cache objects.

ASH then helps me zoom in. I filter ASH samples where wait_class = 'Concurrency' and group by event, SQL_ID, and sometimes P1/P2 (which can encode latch address or hash values depending on the event). That’s often where patterns appear: the same SQL ID hammering the same blocks, or a flood of similar statements causing library cache or shared pool pain. One thing I learned over time is that latch problems are rarely fixed by a single parameter tweak; more often they point to design or workload patterns.

-- ASH view of latch / concurrency waits tied to SQL
SELECT   NVL(event, 'ON CPU') AS event,
         sql_id,
         COUNT(*)             AS samples,
         ROUND(100 * RATIO_TO_REPORT(COUNT(*)) OVER (), 1) AS pct_active
FROM     v$active_session_history
WHERE    wait_class = 'Concurrency'
AND      sample_time BETWEEN TO_TIMESTAMP('&start_ts', 'YYYY-MM-DD HH24:MI:SS')
                        AND TO_TIMESTAMP('&end_ts',   'YYYY-MM-DD HH24:MI:SS')
GROUP BY NVL(event, 'ON CPU'), sql_id
ORDER BY samples DESC;

From there, my tuning responses depend on the latch type. For cache buffers chains, I look for hot blocks: heavy logical reads on a small set of objects, often due to monotonically increasing keys or hot indexes. In practice, I’ve relieved CBC latch waits by spreading I/O across multiple indexes, reversing keys, or changing access patterns to avoid everyone hammering the same few blocks. For library cache or shared pool–related waits, I focus on reducing hard parses and eliminating SQL that differs only by literal values. That’s where bind variables, cursor sharing, and cleaning up SQL generation in the application can make a huge difference.

When latch contention shows up in AWR and ASH, I treat it as a symptom of a deeper pattern: hot blocks, excessive parsing, or badly skewed access. Once I correlate the waits with the specific SQL and objects involved, it becomes much clearer whether I need to change schema design, indexing strategy, or how the application sends SQL. For a deeper checklist of common latch events and their typical fixes, Latch and Mutex Contention Troubleshooting in Oracle: Tanel Põder is a helpful complement to what you see in your reports.

5. Use ASH to Uncover Lock Contention and Blocking Sessions

Lock issues are one of the most painful scenarios I see when diagnosing slow SQL with AWR and ASH, because a single bad transaction can stall dozens of innocent queries. AWR usually gives me the first hint, with waits like enq: TX – row lock contention showing up in Top Timed Events. But it’s ASH that really exposes who is blocking whom and which SQL is responsible.

In ASH, I filter on wait_class = ‘Concurrency’ and event LIKE ‘enq: TX%’ over the time window when users complained. Then I group by blocking_session, session_id, and sql_id. The blockers are the sessions with a non-null blocking_session value showing up repeatedly. One thing I’ve learned is that the blockers often look perfectly fast from their own perspective; they’re just holding locks too long while everyone else queues up behind them.

-- ASH: who is blocked, who is blocking, and on which SQL
SELECT   sample_time,
         session_id,
         blocking_session,
         event,
         sql_id
FROM     v$active_session_history
WHERE    event LIKE 'enq: TX%'
AND      sample_time BETWEEN TO_TIMESTAMP('&start_ts', 'YYYY-MM-DD HH24:MI:SS')
                     AND TO_TIMESTAMP('&end_ts',   'YYYY-MM-DD HH24:MI:SS')
ORDER BY sample_time, blocking_session;

Once I’ve identified the blocking session and its SQL_ID, I tie that back to the SQL text and application module. In my experience, long-running batch updates without proper commit points, or poorly indexed foreign keys, are frequent culprits. ASH gives me enough detail to go back to developers and say, very specifically, “this update on this table held locks for this long and blocked these queries,” which makes fixing the root cause much more straightforward.

6. Correlate SQL Plans, Stats, and Execution History for Problem Queries

Once AWR and ASH point me to a problem SQL_ID, the next step in diagnosing slow SQL with AWR and ASH is to understand what changed: plan, statistics, or data distribution. In my experience, most “suddenly slow” queries trace back to a different execution plan or fresh stats that pushed the optimizer in a new direction.

I start with the historical plan from DBA_HIST_SQL_PLAN or SQL Plan Management views, comparing the current plan to older, faster ones. I look for shifts from index access to full scans, joins changing order, or new nested loops against large data sets. At the same time, I check DBA_TAB_STATS_HISTORY and related views to see whether table or index statistics were gathered shortly before the slowdown; that’s often the smoking gun.

Then I tie this back to AWR’s execution history for that SQL_ID: elapsed time per execution, buffer gets, and rows processed over multiple snapshots. A pattern I see a lot is stable performance for weeks, followed by a spike exactly after a stats job or structural change. When that happens, I either fix the stats (e.g., histograms, stale stats, or missing stats) or use a SQL baseline to lock in a known-good plan while we address the root cause.

-- Compare historical performance of one SQL_ID across AWR snapshots
SELECT  snap_id,
        executions_delta          AS execs,
        elapsed_time_delta/1e6    AS elapsed_s,
        (elapsed_time_delta/1e6) /
        NULLIF(executions_delta,0) AS avg_elapsed_s
FROM    dba_hist_sqlstat
WHERE   sql_id = '&sql_id'
ORDER BY snap_id;

This correlation of plan, stats, and history is what lets me confidently say, “performance changed because the plan changed, and the plan changed because stats or environment changed,” instead of just guessing at generic tuning tips.

7. Build a Repeatable Playbook for Diagnosing Slow SQL with AWR and ASH

Over the years I’ve learned that diagnosing slow SQL with AWR and ASH goes much faster when I follow the same playbook every time, instead of improvising under pressure. My workflow is simple and repeatable:

  • Start with AWR Top SQL and Top Timed Events: identify the worst SQL_IDs and whether the pain is CPU, I/O, locks, or contention.
  • Classify the problem via AWR wait classes: decide if you’re chasing storage, CPU, latch, or lock issues.
  • Zoom in with ASH for the incident window: filter by time, SQL_ID, and wait event to see what was active during the slowdown.
  • Use ASH to expose blockers and hot spots: find blocking sessions, latch hot spots, and specific objects being hammered.
  • Correlate with plans and stats: compare historical plans and AWR performance trends to spot what changed.

Once I’ve walked these steps, I document the findings and tunings so the next incident is faster to resolve. Over time, that documentation becomes a living runbook the whole team can rely on, instead of depending on one person’s memory.

Conclusion: Key Takeaways for Faster Slow SQL Diagnosis

When I’m under pressure to fix a production slowdown, diagnosing slow SQL with AWR and ASH is my fastest path from noise to root cause. AWR gives me the strategic view: top SQL, top waits, and whether I’m fighting CPU, I/O, locks, or contention. ASH then lets me zoom into the exact minutes of pain, showing which sessions and SQL were actually active and what they were waiting on.

The real power comes from using them together: AWR to find the suspects, ASH to prove what happened, and then plans and stats history to explain why things changed. Over time, I’ve found that turning this into a repeatable playbook—check AWR, drill with ASH, correlate plans and stats, then document the fix—cuts diagnosis time dramatically and makes each new incident easier to handle.

Conclusion: Key Takeaways for Faster Slow SQL Diagnosis - image 1

Join the conversation

Your email address will not be published. Required fields are marked *