Detecting Blocking Tasks in Asyncio by Measuring Event Loop Latency
Asyncio only works if every coroutine cooperates. One blocking call can freeze your entire app. This post shows a simple watchdog coroutine that measures event loop latency, detects blocking tasks early, and turns invisible stalls into actionable metrics.
Asyncio gives you lightweight concurrency, but only as long as every coroutine cooperates. One blocking call (just one) can freeze the entire application. When that happens, all your “concurrent” tasks stall at the same time: HTTP handlers slow down, background jobs drift, timeouts trigger late, and nothing looks obviously wrong.
There’s an easy way to catch this: run a tiny coroutine that repeatedly sleeps and checks how late it wakes up. If the event loop is stuck, this coroutine wakes up late, and that delay becomes your signal. Add metrics and a graph, and you have a reliable early-warning system for any blocking task in your async code.
This post walks through why blocking the event loop is so disruptive, how to measure it, and how to turn that measurement into something actionable in production.
Asyncio in practice: cooperative multitasking
Under the hood, asyncio is cooperative. The event loop runs a set of tasks, and each task must explicitly yield control with await. When it does, the loop schedules other tasks or handles I/O events:
while True:
task = pick_next_ready_task()
task.run_until_next_await()
Because everything shares a single event loop thread, concurrency works only if tasks frequently yield. If they don’t, nothing else progresses.
This is the catch: a coroutine that performs blocking I/O or heavy CPU work without await prevents the loop from serving other coroutines. The entire application stalls, often in ways that are difficult to identify solely from logs.
What blocking looks like in reality
Blocking usually comes from:
time.sleep()inside async code- synchronous HTTP or database clients
- CPU-heavy work (compression, JSON parsing, regex)
- filesystem operations done synchronously
Timeline example:
t = 0 ms everything healthy
t = 10 ms coroutine calls time.sleep(0.5)
t = 10–510 ms event loop frozen
t = 510 ms loop resumes, all tasks run late
Every time-based operation slips. A task meant to run every 20 ms might fire 500 ms late. An HTTP request that usually takes 5 ms now takes 505 ms.
Unless you’re monitoring the event loop itself, this is almost invisible.
A simple trick: measure loop latency with a watchdog coroutine
The idea:
- Record the current time.
await asyncio.sleep(dt)- Measure how late the coroutine wakes up.
If the event loop is healthy, the delay is small. If something blocks the loop, you detect it immediately.
latency = actual_wakeup - expected_wakeup
Implementing a loop latency watchdog
import asyncio
import logging
log = logging.getLogger(__name__)
async def loop_latency_watchdog(
interval: float = 0.02, # 20ms sampling
warn_threshold: float = 0.1, # warn above 100ms
):
loop = asyncio.get_running_loop()
while True:
started_at = loop.time()
await asyncio.sleep(interval)
ended_at = loop.time()
latency = max(0.0, ended_at - started_at)
if latency > warn_threshold:
log.warning(
"Event loop latency high: %.3f s (interval=%.3f s)",
latency,
interval,
)
Start it at application boot:
async def main():
asyncio.create_task(loop_latency_watchdog())
await start_your_app()
asyncio.run(main())
Seeing a spike: a blocking example
import asyncio
import time
import logging
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)
async def blocking_task():
while True:
await asyncio.sleep(1)
log.info("Blocking the loop for 500ms…")
time.sleep(0.5)
log.info("Back from blocking sleep")
async def main():
asyncio.create_task(loop_latency_watchdog())
asyncio.create_task(blocking_task())
await asyncio.Event().wait()
asyncio.run(main())
Example log:
INFO Blocking the loop for 500ms…
WARNING Event loop latency high: 0.502 s
INFO Back from blocking sleep
A perfect demonstration of loop freeze → watchdog spike.
Graphing and alerting
Once you export latency to metrics, you can graph:
- p50/p95/p99 loop latency
- correlation with HTTP latency
- per-endpoint patterns during stalls

Useful alerts:
- loop latency > 100ms for 30 seconds
- p99 latency above 200ms
This gives you a clean separation between “system is slow” and “event loop is frozen.”
Tuning the sampling interval
Shorter intervals catch smaller stalls but generate more metrics.
- 5–10 ms: low-latency apps, high resolution.
- 20–50 ms: good default.
- 100 ms+: lowest overhead, detects only large stalls.
A solid starting point:
interval = 0.02 s # 20ms
warn_threshold = 0.1 s
After detection: finding the culprit
The watchdog tells you when the loop was blocked. To figure out why, you need to correlate with:
- request logs
- CPU spikes
- GC activity
- stack dumps
- profiling
Typical fixes:
- move CPU-heavy operations to
run_in_executor - switch synchronous clients to async versions
- isolate expensive tasks into worker services
Takeaways
- Asyncio gives concurrency only if tasks cooperate.
- A single blocking call can freeze the entire application.
- A tiny watchdog coroutine reliably detects loop stalls.
- With metrics and alerts, you catch blocking tasks the moment they happen.
This type of lightweight guardrail provides immediate feedback and helps surface bugs that would be nearly invisible otherwise.