Case Study: Fixing FastAPI Event Loop Blocking in a High-Traffic API

Introduction

When I first rolled out this FastAPI service, everything looked great in staging: low latency, clean logs, and CPU usage well within limits. But a few weeks after going live under real production traffic, the symptoms started creeping in—sporadic latency spikes, requests piling up in the queue, and occasional timeouts that were nearly impossible to reproduce in a controlled environment. On the surface, the code looked “async-friendly,” but underneath, subtle FastAPI event loop blocking was quietly undermining performance.

In this case study, I’ll walk through how I diagnosed and fixed a set of blocking behaviors inside the FastAPI event loop in a high-traffic API. The issues weren’t dramatic, like an obvious infinite loop; they were small, scattered blocking calls that only became painful at scale. What I learned the hard way is that you can have a mostly asynchronous stack and still lose the benefits if just a few critical paths block the loop.

By the end of this article, you’ll see how I:

Recognized the real-world symptoms of event loop blocking under production load.
Instrumented the API to prove the loop was being blocked and identify hot spots.
Refactored blocking code (including database access, third-party SDKs, and CPU-heavy logic) into non-blocking patterns.
Measured before/after improvements in tail latency, throughput, and resource usage.

My goal is to give you a clear, practical path to follow if you suspect FastAPI event loop blocking in your own services, without needing exotic tools or rewriting your entire stack.

Background & Context

The Product and Core Use Case

The FastAPI service in this case study powers a real-time analytics API for a B2B SaaS product. Clients send small JSON events (typically 1–2 KB) and then query aggregated metrics that feed dashboards, alerts, and internal decision-making tools. From the start, I designed the service around low-latency writes and fast reads, because users were highly sensitive to lag between sending an event and seeing it reflected in their dashboards.

At a high level, the service exposes two main groups of endpoints:

Ingestion endpoints: high-volume, small-payload POSTs from web and mobile apps.
Query endpoints: read-heavy GETs for aggregated metrics, filtered by customer, time range, and a handful of dimensions.

Because most of the business logic was I/O-bound—database queries, HTTP calls to internal services, and some cache lookups—FastAPI seemed like a perfect fit for maximizing concurrency with a relatively small number of workers.

High-Level Architecture and Tech Stack

Before the FastAPI event loop blocking issues surfaced, the architecture was intentionally straightforward. I wanted something I could reason about quickly during incidents, without layers of complexity hiding the root cause of performance problems.

The core pieces looked like this:

FastAPI app running behind an Nginx ingress in Kubernetes, served by Uvicorn workers managed via Gunicorn.
PostgreSQL as the primary data store for both raw events and pre-aggregated tables.
Redis for hot-path caching of popular query results and rate-limiting data.
A few internal HTTP services (authentication, billing, feature flags) called on each request.

Each Kubernetes pod ran several Uvicorn workers, each with its own event loop. To keep things simple, I initially mixed async and sync code: async endpoints calling out to a synchronous ORM, a couple of CPU-heavy transformations (JSON normalization and validation), and one legacy SDK that only exposed blocking I/O operations.

In my experience, this kind of architecture is common when teams adopt FastAPI incrementally: you get the async definitions in place, but you still lean on older sync tooling that seems “good enough” in early environments.

Traffic Profile and Non-Functional Requirements

When the service went live, the traffic profile was already non-trivial, and the growth projections were aggressive. On an average weekday we saw:

2–3k requests per second at steady state across all endpoints.
Short, bursty spikes up to 5–6k RPS during customer batch jobs, product launches, and regional traffic peaks.
95th percentile latency target of under 150 ms for ingestion and under 250 ms for queries.

From a non-functional perspective, I committed to a few hard constraints with stakeholders:

High availability: the API needed to tolerate node failures and rolling deployments without noticeable degradation.
Predictable tail latency: not just fast averages, but tight p95 and p99 latencies, because dashboards and webhooks were highly sensitive to outliers.
Efficient resource usage: we wanted to avoid scaling horizontally at every small traffic increase, keeping compute costs under control.

In early load tests, the FastAPI stack easily met these goals. However, as real-world traffic patterns emerged—with uneven bursts, noisy neighbors in the cluster, and a few heavier-than-expected queries—the hidden FastAPI event loop blocking started to matter. What had been a clean, seemingly asynchronous architecture on paper began to show its seams under sustained load, and that’s when I had to dig deeper into where the loop was actually spending its time.

The Problem: Symptoms of FastAPI Event Loop Blocking

Latency Spikes and “Serial” Behavior Under Load

The first thing I noticed was a subtle but persistent drift in latency. Median response times were mostly fine, but p95 and p99 crept upwards over a few weeks. During traffic spikes, some requests suddenly took 1–3 seconds, even though the underlying database queries and cache calls were still fast when measured in isolation.

What really caught my attention was how the service behaved under controlled load tests. When I pushed a single endpoint with concurrent requests, I expected FastAPI to handle them in parallel on the event loop. Instead, the pattern looked almost serial: as concurrency increased, throughput plateaued far earlier than it should have, while latency ballooned. In my experience, that kind of behavior is a classic sign that something is blocking the loop long enough to stall other coroutines.

At this stage, the code looked asynchronous—endpoints were defined with async def, and I was using await liberally. But the runtime behavior suggested that, under the hood, parts of each request were still running like a traditional synchronous app, effectively undermining FastAPI’s concurrency model.

Confusing Metrics: CPU, RPS, and Timeouts

The production metrics made the diagnosis even more confusing. On the surface, nothing screamed “we’re overloaded”:

CPU utilization on the pods was moderate, often hovering around 50–60% even during peak traffic.
Database metrics showed healthy query times and low lock contention.
Redis was nowhere near capacity, and network I/O looked normal.

Yet at the same time, I was seeing:

Increasing request queue times at the load balancer level.
Periodic gateway timeouts for downstream clients.
A noticeable drop in effective RPS per pod long before hitting the theoretical limits.

One thing I’ve learned is that this combination—moderate CPU, healthy dependencies, but rising tail latency and timeouts—often points to an event loop problem rather than a raw capacity issue. The workers were alive and not CPU-saturated, but they were spending too much time stuck in blocking calls, unable to interleave other coroutines.

To make matters trickier, traditional APM dashboards didn’t clearly differentiate between “time spent in async I/O” and “time spent blocking the event loop.” At first glance, the traces just looked like “busy endpoints”. It wasn’t obvious that other coroutines were being starved while a few slow paths hogged the loop. Discover what is blocking the event loop – Stack Overflow

Early Clues Inside the Code

The final clues came when I started walking through some of the hot-path endpoints line by line. A few patterns jumped out immediately:

Synchronous ORM calls inside async endpoints, often buried behind helper functions that looked innocent.
CPU-heavy data transformations (JSON normalization, signature verification) running directly on the event loop instead of a worker pool.
A legacy third-party SDK that performed blocking HTTP requests without any async support.

In one ingestion endpoint, for example, the code flow looked roughly like this:

@app.post("/events")
async def ingest_event(payload: dict):
    # 1. Synchronous normalization (CPU-heavy on large payloads)
    normalized = normalize_payload_sync(payload)

    # 2. Blocking DB write through sync ORM
    save_to_db_sync(normalized)

    # 3. Blocking HTTP call to an internal service via legacy SDK
    billing_client.notify_event_sync(normalized["customer_id"])

    return {"status": "ok"}

When I first wrote this, it felt like a pragmatic compromise: keep the FastAPI layer async, but reuse existing sync components. Under light load, it behaved well enough that no one complained. Under sustained, high concurrency, though, each of these calls effectively paused the event loop for that worker. The result was textbook FastAPI event loop blocking: other requests waiting in line, inflated tail latency, and the illusion of serial execution on what should have been a highly concurrent system.

Once I connected these symptoms—serial-like behavior, confusing metrics, and suspicious sync calls hidden in async endpoints—it became clear that the event loop itself was the bottleneck, not the underlying infrastructure. The next step was to prove it with targeted instrumentation and then systematically eliminate the blocking paths.

Constraints & Goals

Organizational and Technical Constraints

When I set out to fix the FastAPI event loop blocking issues, I couldn’t treat it as a greenfield rewrite. There were real constraints that shaped every decision. The API was already embedded in a larger platform, and multiple product teams depended on its existing contract and uptime. Breaking changes to request/response schemas were off the table, and we had to preserve the same deployment model: Kubernetes, Nginx ingress, and a mix of internal and external clients that weren’t going to change their usage patterns overnight.

On the technical side, I also had to respect the stack decisions we’d already made: PostgreSQL and Redis were non-negotiable, and several internal services exposed only synchronous HTTP clients or older SDKs. I couldn’t simply replace the database layer or introduce a completely new message bus in the middle of an incident-driven performance effort. That meant any fix had to work with our existing infrastructure, not around it.

Team Skills, Budget, and Time

Another reality was the composition of the team. Most of us were comfortable with Python, but only a subset had deep experience with async I/O and event loops. In my experience, throwing highly experimental async patterns into a codebase that a broader team has to maintain is a great way to create future incidents. So I needed solutions that were idiomatic but also understandable to teammates who mostly lived in the synchronous Django or Flask world.

From a budget and time perspective, we didn’t have the luxury of a multi-month refactor. The mandate was clear: improve tail latency and stability within a few sprints, without a major increase in infrastructure costs. Scaling pods horizontally indefinitely would have solved some symptoms, but finance had already flagged our compute spend. That pushed me toward targeted, high-leverage fixes—wrapping blocking calls, moving specific code paths off the loop—rather than broad architectural overhauls.

Performance and Reliability Targets

With those constraints in mind, I defined explicit goals so we’d know when the FastAPI event loop blocking problems were “good enough” to move on. The primary performance targets were:

p95 latency < 150 ms for ingestion endpoints and < 250 ms for query endpoints under normal peak load.
Stable p99 latency, with a clear reduction in long-tail outliers caused by blocking operations.
Higher effective throughput per pod, delaying the need for horizontal scaling while keeping CPU utilization reasonable.

On the reliability side, my goals were:

No increase in error rates or timeout frequency during and after the changes.
Zero-downtime rollouts, using canary deployments and gradual traffic shifting.
Observable behavior: enough instrumentation around the event loop and key endpoints to quickly detect regressions in the future.

Putting these goals in writing helped keep the work focused. Instead of chasing theoretical optimizations, I could align every change with a concrete outcome: better concurrency, fewer blocked workers, and a smoother experience for the teams and customers relying on this FastAPI service.

Approach & Strategy for Diagnosing FastAPI Event Loop Blocking

Reproducing Production Behavior with Targeted Load Tests

Before touching the code, I wanted a reliable way to reproduce the production symptoms on demand. In my experience, chasing FastAPI event loop blocking without a good load test is like debugging in the dark: every fix feels speculative. So I started by designing a small but focused suite of load tests that mimicked our real traffic profile—short bursts, mixed endpoints, and realistic payload sizes.

I created separate scenarios for ingestion and query endpoints, gradually increasing concurrency while watching throughput and latency curves. What I was looking for specifically was the “tell-tale” pattern: throughput flattening while p95 and p99 latencies climbed, even though CPU wasn’t pegged. That pattern strongly suggests the event loop is spending too much time blocked on some operations.

To make this actionable, I tagged each load test with a scenario name and pushed that into request headers so I could correlate logs and metrics later. Here’s a simplified example of how I drove concurrent requests from a Python script:

import asyncio
import httpx

async def worker(client, idx):
    headers = {"X-Scenario": "ingestion_burst"}
    payload = {"event_id": idx, "value": idx % 10}
    resp = await client.post("https://api.example.com/events", json=payload, headers=headers)
    return resp.status_code

async def run_load(concurrency: int, total_requests: int):
    async with httpx.AsyncClient(timeout=5.0) as client:
        tasks = []
        for i in range(total_requests):
            tasks.append(asyncio.create_task(worker(client, i)))
            if len(tasks) == concurrency:
                await asyncio.gather(*tasks)
                tasks.clear()

asyncio.run(run_load(concurrency=100, total_requests=5000))

Once I could reliably trigger the bad behavior in a non-production environment, it became much easier to iterate on instrumentation and see whether a change actually reduced blocking.

Structured Logging Around Critical Sections

The next pillar of my strategy was structured logging around suspect code paths. I didn’t want to guess which parts of each request were blocking; I wanted timestamps and durations. So I wrapped key sections—database access, third-party SDK calls, and CPU-heavy transformations—with thin timing utilities that pushed structured events to our log pipeline.

At the FastAPI layer, this looked like a combination of middleware and manual spans inside endpoints. I logged a unique request ID, endpoint name, scenario (from the load test header), and micro-timings for each critical section. Over time I’ve found that this kind of simple, consistent logging beats fancy dashboards when you’re trying to answer a very specific question: “Where is the event loop actually stalling?”

Here’s a minimal example of the kind of timing wrapper I used:

import time
import logging

logger = logging.getLogger("perf")

class Timer:
    def __init__(self, name: str, extra: dict | None = None):
        self.name = name
        self.extra = extra or {}

    def __enter__(self):
        self.start = time.perf_counter()
        return self

    def __exit__(self, exc_type, exc, tb):
        duration_ms = (time.perf_counter() - self.start) * 1000
        logger.info(
            "section_timing",
            extra={"section": self.name, "duration_ms": duration_ms, **self.extra},
        )

# Usage inside an endpoint

@app.post("/events")
async def ingest_event(payload: dict):
    with Timer("normalize_payload"):
        normalized = normalize_payload_sync(payload)

    with Timer("db_write"):
        save_to_db_sync(normalized)

    return {"status": "ok"}

By aggregating these logs, I started to see a clear pattern: during spikes, certain sections blew up in duration, and when they did, all other coroutines on the same worker stalled alongside them. That was strong circumstantial evidence of event loop blocking.

Event Loop Instrumentation and Blocking Detection

The final piece was direct instrumentation of the event loop itself. I wanted to move beyond circumstantial evidence and actually measure how often, and for how long, the loop was failing to process other tasks. One thing I’ve learned is that even simple loop-level metrics—like the maximum delay between scheduled callbacks—can be incredibly revealing when you overlay them with load test timelines.

In practice, I used a background task in the FastAPI app that periodically scheduled a tiny coroutine and measured how late it ran relative to its expected schedule. Large delays, even with moderate CPU usage, were a smoking gun for blocking code on the loop. I also experimented with dedicated tools for detecting and profiling Python async event loop blocking, which helped validate my homegrown measurements. Background Tasks – FastAPI

A simplified example of a loop-delay monitor inside the app looks like this:

import asyncio
import time
import logging

logger = logging.getLogger("loop")

async def loop_monitor(interval: float = 0.1):
    """Log event loop delay; large values often indicate blocking code."""
    while True:
        expected = time.perf_counter() + interval
        await asyncio.sleep(interval)
        now = time.perf_counter()
        delay_ms = max(0, (now - expected) * 1000)
        if delay_ms > 50:  # threshold tuned for our environment
            logger.warning("event_loop_delay", extra={"delay_ms": delay_ms})

@app.on_event("startup")
async def start_monitor():
    asyncio.create_task(loop_monitor())

When I correlated these loop-delay warnings with structured endpoint timings and load test runs, the picture became unambiguous: we weren’t just slow; we were blocking the FastAPI event loop in a few hot paths. That clarity shaped the rest of the project: instead of guessing at optimizations, I could systematically target the specific sections that correlated with loop delays and latency spikes.

Implementation: Fixing Blocking Calls in FastAPI Async Endpoints

Auditing and Categorizing Blocking Code

Once I had solid evidence of FastAPI event loop blocking, the first implementation step was a careful audit of the hot paths. I went through each frequently hit endpoint and tagged every call as CPU-bound, I/O-bound but async-safe, or I/O-bound and blocking. This forced me to be honest about what was actually safe to run on the loop and what had to move elsewhere.

The main categories that emerged were:

Sync ORM operations: database reads/writes using a synchronous ORM that wrapped psycopg2.
Legacy HTTP clients and SDKs: blocking requests to internal services (auth, billing, feature flags).
CPU-heavy transformations: JSON normalization, validation, and signature checks that spiked CPU for tens of milliseconds per request.

In my experience, just writing this down in a small design doc helped align the team. Instead of arguing about whether “async is worth it,” we could point to specific lines of code that were objectively blocking the loop and decide, case by case, how to isolate them.

Offloading Synchronous Work to Thread Pools

The quickest wins came from moving unavoidable synchronous work into thread pools. Rather than rewriting our entire ORM layer or replacing every SDK, I wrapped the calls with asyncio.to_thread so they would run in a separate thread and free up the event loop to keep serving other requests.

I started with a small utility to standardize this pattern:

import asyncio
from typing import Callable, TypeVar

T = TypeVar("T")

async def run_in_thread(func: Callable[..., T], *args, **kwargs) -> T:
    """Run blocking code in a thread to avoid event loop blocking."""
    return await asyncio.to_thread(func, *args, **kwargs)

Then I incrementally refactored endpoints. A simplified before/after for the ingestion endpoint looked like this:

# Before: all blocking on the event loop
@app.post("/events")
async def ingest_event(payload: dict):
    normalized = normalize_payload_sync(payload)
    save_to_db_sync(normalized)
    billing_client.notify_event_sync(normalized["customer_id"])
    return {"status": "ok"}

# After: sync work moved to background threads
@app.post("/events")
async def ingest_event(payload: dict):
    # CPU-heavy normalization off the loop
    normalized = await run_in_thread(normalize_payload_sync, payload)

    # Blocking DB write in thread pool
    await run_in_thread(save_to_db_sync, normalized)

    # Blocking HTTP call via legacy SDK in thread pool
    await run_in_thread(billing_client.notify_event_sync, normalized["customer_id"])

    return {"status": "ok"}

When we deployed this pattern behind a feature flag, I immediately saw loop delay warnings drop and per-pod throughput increase. One lesson I’ve taken from this is that you don’t have to be purist about async: carefully isolating blocking code in thread pools can get you most of the concurrency benefits without a full rewrite.

Adopting Async-Aware Clients for Databases and HTTP

Thread pools helped a lot, but they’re ultimately a bridge solution. For the busiest paths, I wanted true async I/O so we could scale without continually tuning thread pool sizes. Over a couple of iterations, we selectively replaced synchronous clients with async-capable libraries where it offered the biggest payoff.

For HTTP calls to internal services, we moved from a sync requests-based wrapper to httpx.AsyncClient. The pattern was straightforward, but I was careful to centralize client creation and reuse connections:

import httpx
from fastapi import FastAPI

app = FastAPI()

@app.on_event("startup")
async def startup():
    app.state.http_client = httpx.AsyncClient(timeout=5.0)

@app.on_event("shutdown")
async def shutdown():
    await app.state.http_client.aclose()

async def notify_billing_async(customer_id: str):
    client: httpx.AsyncClient = app.state.http_client
    resp = await client.post(
        "https://billing.internal/notify",
        json={"customer_id": customer_id},
    )
    resp.raise_for_status()
    return resp.json()

@app.post("/events")
async def ingest_event(payload: dict):
    # Assume normalization already async-safe here
    customer_id = payload["customer_id"]
    await notify_billing_async(customer_id)
    return {"status": "ok"}

On the database side, we introduced an async gateway for a subset of queries using an async driver and connection pool, then gradually migrated the most latency-sensitive endpoints to use it. We didn’t try to convert every query at once; instead, we focused on the small set of operations that appeared most often in traces during peak load.

Along the way, I leaned on tools for monitoring FastAPI event loop health to validate that these changes really reduced blocking rather than just moving it around. Concurrency and async / await – FastAPI

Refactoring CPU-Bound Logic and Guardrails

The last major step was dealing with CPU-heavy logic that still had to live in the service. Some of it could be optimized (e.g., more efficient JSON parsing, caching expensive intermediate results), but some workloads were inherently CPU-bound. For those, I treated them as background jobs even if they were logically part of request handling.

In the short term, I pushed CPU-bound work into the same thread-pool wrapper used for blocking I/O, but I also added explicit guardrails:

Caps on maximum payload size and complexity, with explicit 4xx errors when clients exceeded safe limits.
Pre-flight checks to reject obviously pathological requests before doing expensive work.
Feature flags so we could quickly disable heavy paths if they started hurting tail latency again.

For a particularly expensive normalization step, I eventually split the operation in two: a lightweight inline validation during request handling, and a deeper normalization + enrichment step that ran asynchronously via a background worker. FastAPI only needed to confirm that the event was accepted and persisted; downstream consumers could wait a little longer for fully normalized data.

Putting these patterns together—thread pools for unavoidable sync code, async clients for core I/O, and guardrails around CPU-heavy work—gave us a pragmatic playbook. In my experience, this mix is what makes FastAPI sustainable at high traffic: you get the best parts of async without pretending that all of your dependencies are magically non-blocking.

Results: Throughput, Latency, and Event Loop Health After Fixes

Throughput and Resource Utilization Improvements

Once the changes were rolled out and the dust settled, the first win I noticed was a clear jump in effective throughput per pod. Under the same synthetic load tests I used earlier, we saw:

Throughput per pod increase of ~40–60% on the hottest ingestion endpoints.
A much smoother RPS vs. concurrency curve: instead of flattening early, throughput scaled almost linearly until we hit CPU limits.
More predictable CPU utilization, with fewer sudden spikes tied to blocking sections.

In practice, this meant we could handle the same peak traffic with fewer pods, or significantly higher traffic at the same cost. From a budget perspective, this was the change that finally got leadership to care about FastAPI event loop blocking as more than just an abstract engineering concern. In my experience, it’s hard to argue with a performance fix that pays for itself in infrastructure savings.

Another subtle but important outcome was how evenly load distributed across workers. Previously, some workers would get “stuck” processing a few slow requests while others sat relatively idle. After offloading blocking work and adopting async clients, worker utilization became far more balanced, which directly translated into better aggregate throughput.

Latency, Tail Behavior, and Error Rates

Latency was where the impact was most visible to end users. Across both synthetic tests and real production traffic, we measured:

p95 ingestion latency drop from roughly 220–250 ms to 110–140 ms during peak hours.
p99 latency reduction from multi-second outliers down to a few hundred milliseconds in most scenarios.
A sharp decline in gateway timeouts and client-side retries, especially under bursty traffic.

One thing I’ve learned is that users often don’t notice a 20% improvement in median latency, but they absolutely notice when the long tail gets cleaned up. Support tickets around “stuck dashboards” and “slow webhooks” dropped noticeably once those p99 spikes disappeared. Internally, teams relying on our API for batch jobs stopped building elaborate retry schemes just to cope with occasional slowness.

Error rates followed the same trend. By unblocking the event loop, we indirectly:

Reduced upstream timeouts from load balancers and API gateways.
Cut down on cascading failures, where slow responses on one service triggered issues in another.
Stabilized dependency behavior, since downstream services weren’t hammered by retry storms.

From my perspective, the most satisfying part was watching our canary pods handle new traffic without the characteristic latency “sawtooth” pattern that had become all too familiar during the FastAPI event loop blocking days.

Event Loop Health and Operational Confidence

Instrumenting the event loop turned out to be just as valuable after the fixes as it was during diagnosis. With the loop-delay monitor and structured timings in place, I could finally quantify how “healthy” the loop was over time instead of relying on gut feel.

Post-deployment, the metrics told a clear story:

Event loop delay warnings (e.g., delays > 50 ms) dropped by over 90% during peak load windows.
Loop delay histograms shifted dramatically left, clustering around a few milliseconds even under heavy concurrency.
Correlation between loop delays and endpoint latency largely disappeared; remaining spikes were tied to normal CPU saturation, not blocking I/O.

Having these signals in our dashboards changed how I operated the service day to day. I started treating loop health as a first-class SLI, right alongside RPS and error rate. When we introduced new dependencies or features, I would intentionally run focused load tests and watch the loop-delay metrics first; if they stayed flat, I felt confident we weren’t reintroducing hidden blocking paths.

To help keep it that way, I documented the patterns we used—thread pools, async clients, and guardrails—and shared them with the rest of the team. Combined with a few lightweight tools for monitoring FastAPI event loop health in CI and staging environments, this gave us a sustainable way to keep blocking under control instead of treating it as a one-off firefight. Concurrency and async / await – FastAPI

Overall, the shift was noticeable: instead of fearing traffic spikes, we could simulate them, observe the event loop staying responsive, and roll forward with a lot more confidence.

What Didn’t Work: Failed Attempts to Solve Event Loop Blocking

Throwing More Hardware at the Problem

My first instinct, and honestly the easiest lever to pull, was to scale out. We increased the number of pods, tweaked autoscaling thresholds, and bumped CPU limits. At a glance, this looked promising: overall capacity went up and some timeouts disappeared. But under closer inspection, the core FastAPI event loop blocking symptoms remained—p95 and p99 latencies still spiked under bursty load, and individual workers still showed periods where they were “stuck” even though cluster-wide CPU was fine.

What I learned here is that horizontal scaling doesn’t fix a fundamentally blocked event loop; it just spreads the pain across more instances. We were paying more for compute while still hitting the same concurrency ceiling per worker. That realization pushed me away from infrastructure-only fixes and back toward the code.

Superficial Async Refactors Without Touching Dependencies

Another failed path was what I’d call the “cosmetic async refactor.” Early on, I went through a few endpoints and converted them to async def, sprinkling await where it felt appropriate, but I didn’t change the underlying synchronous dependencies. The ORM was still sync, the HTTP clients were still blocking, and CPU-heavy work still ran directly on the loop.

Unsurprisingly in hindsight, this didn’t meaningfully change behavior. The code looked modern and asynchronous, but the event loop was still being blocked by the same operations. In some cases it was actually worse, because the async veneer gave a false sense of safety and made people more comfortable piling additional work into those endpoints. That experience convinced me that any serious attempt to fix FastAPI event loop blocking has to start with dependency behavior, not just function signatures.

Over-Aggressive Caching and Timeouts

At one point, I tried to reduce perceived slowness by aggressively caching responses and tightening timeouts on downstream calls. The idea was that if dependencies returned faster or were hit less often, the loop would stay healthier. In practice, this created a new class of problems:

We served stale or incorrect data to some clients because cache invalidation got complicated under load.
Tighter timeouts led to more retries, which amplified traffic to already-struggling downstream services.
The underlying blocking behavior didn’t disappear; the event loop was still being held up whenever a cache miss or retry occurred.

One thing I learned the hard way was that you can’t cache or timeout your way out of fundamentally blocking code. Those techniques are useful once the loop is healthy, but as a primary fix they mostly shift where the pain shows up. That’s what finally convinced me to focus on isolating blocking calls (via thread pools and async clients) instead of trying to paper over them with clever caching strategies.

Lessons Learned & Recommendations for Avoiding FastAPI Event Loop Blocking

Designing Endpoints with the Event Loop in Mind

Coming out of this project, my biggest takeaway is that you have to design FastAPI endpoints around the event loop from day one. In my experience, the teams that struggle most with FastAPI event loop blocking are the ones that treat async as a drop-in replacement for sync frameworks, instead of a different mental model entirely.

Now, when I review new endpoints, I start with a simple checklist:

Every network hop is async-aware: database, HTTP, queues, and caches all use non-blocking clients where possible.
No heavy CPU work in the hot path without an explicit decision to offload it (thread pool or background worker).
Clear separation of concerns: request validation, I/O, and business logic are easy to reason about and instrument separately.

One habit that’s helped is to keep an explicit boundary between “loop-safe” and “blocking” code. I mentally categorize functions as pure async, blocking I/O, or CPU-bound, and I expect code to make that visible. For example, I’ll often wrap blocking sections in helper functions so it’s obvious they need special handling:

# Explicitly mark blocking pieces

def write_order_to_db(order: dict) -> None:
    ...  # sync ORM work

async def create_order(order: dict) -> None:
    await run_in_thread(write_order_to_db, order)

Thinking this way keeps me honest about what’s actually running on the loop and makes it much easier to spot trouble during reviews.

Patterns, Anti-Patterns, and Practical Guardrails

Over time, a few concrete patterns and anti-patterns have stood out as particularly important for avoiding FastAPI event loop blocking.

Patterns that have worked well for me:

Centralized async clients: Create and reuse async HTTP/database clients at startup instead of instantiating them per request.
Thin async endpoints: Keep endpoint functions small; push most logic into services that clearly indicate whether they’re async-safe or blocking.
Dedicated helpers for offloading: Use utilities like run_in_thread or a shared executor instead of ad-hoc asyncio.to_thread calls everywhere.
Early validation and limits: Reject oversized or pathological requests up front to avoid dragging the loop through expensive work it shouldn’t be doing.

Anti-patterns I actively push back on now:

“Async on the surface, sync underneath”: async def endpoints that call blocking ORMs, blocking HTTP libraries, or CPU-heavy code directly.
Fire-and-forget tasks without tracking: spawning background tasks with asyncio.create_task and never monitoring or limiting them.
Unbounded work per request: loops over user-supplied collections with expensive processing and no hard caps.

One practical guardrail I’ve started using is a simple linting/checklist script that scans the codebase for obvious hazards—like imports of requests in modules that are meant to be async—or time.sleep inside anything reachable from the request path. It’s not perfect, but it catches a surprising number of issues before they hit production.

Team Practices, Reviews, and Ongoing Monitoring

Technically, the fixes were important, but the cultural changes around them have mattered just as much. When I first started working on this, async felt like a niche skill on the team; now we treat it as a shared responsibility.

Concretely, a few practices have helped:

Async-focused code reviews: For FastAPI changes, reviewers explicitly ask, “Where could this block the loop?” and look for blocking dependencies.
Shared patterns doc: We keep a short internal guide of “approved” patterns for DB access, HTTP calls, and background work in async services.
Pre-merge load tests for risky changes: When a PR touches hot paths, we run a small, automated load test in staging and watch loop-delay metrics.

On the monitoring side, I now treat event loop health as part of the standard observability stack. That means:

Dashboards for loop delay, request concurrency, and per-endpoint latency.
Alerts when loop delay crosses a threshold for sustained periods.
Correlation between loop metrics and deploys, so we can quickly blame (or exonerate) new changes.

When teammates ask how to get started, I usually point them to a few solid tools and libraries for tracking FastAPI event loop behavior, plus examples of how we wired them into our stack. fastapi-observability GitHub Repository In my experience, the combination of clear patterns, thoughtful reviews, and lightweight but continuous monitoring is what keeps FastAPI event loop blocking from sneaking back in months after you think you’ve fixed it.

Conclusion / Key Takeaways

From Mysterious Slowdowns to a Clear Diagnosis

Looking back, the hardest part wasn’t changing code—it was proving that FastAPI event loop blocking was the real culprit behind our flaky performance. Once I combined realistic load tests, structured logging, and event loop instrumentation, the picture came into focus: a handful of sync ORM calls, legacy HTTP clients, and CPU-heavy transformations were effectively freezing the loop under load.

By systematically offloading blocking calls to thread pools, adopting async-aware database and HTTP clients for the hottest paths, and putting guardrails around CPU-bound work, we turned the service from brittle to boring—in a good way. Throughput climbed, tail latencies dropped, and event loop delay metrics flattened out. Just as importantly, we came away with a shared mental model of how to treat the event loop as a scarce resource instead of an implementation detail.

Actionable Takeaways for Async Python APIs

For teams building or maintaining FastAPI services, a few concrete lessons stand out from my experience:

Treat every dependency as suspect: assume ORMs, HTTP clients, and SDKs are blocking until you’ve confirmed they’re async-safe or isolated in thread pools.
Design endpoints around the loop: keep handlers thin, push heavy work out of the hot path, and be explicit about which functions are blocking or CPU-bound.
Instrument first, optimize second: add structured timing logs and a simple event loop delay monitor before you start refactoring—otherwise you’re guessing.
Prefer targeted async adoption: start by converting the highest-traffic, most latency-sensitive paths to async clients instead of attempting a big-bang rewrite.
Bake practices into the team: make “could this block the event loop?” a standard review question, and expose loop health metrics alongside RPS and error rate.

For me, the big shift was realizing that FastAPI performance isn’t just about choosing an async framework; it’s about consistently respecting the event loop in every design and review decision. If you build that habit early, you’re far less likely to end up in the kind of firefight that kicked off this case study.

Cary Huang

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.