Sister post · companion to Parts 1–5

Measure the Loop, Not the Request

Average-latency dashboards will lie about asyncio. Here are the five metrics that don't — and how to export them from inside the loop.

~14 min read Python 3.11+ Assumes: Part 2

If you have one dashboard for an asyncio service and it shows average request latency, you are flying blind. The cooperative trap from Part 2 hides exactly where averages cannot see: the request that landed on a frozen loop sees the entire freeze; every other request that day looks fine. p50 doesn't budge. p99 explodes. The graph that would tell you this is not the request-latency graph at all. It is a graph about the loop itself.

This post is a tool-piece — the smallest set of metrics, hooks, and live-introspection commands that will tell you the truth about an asyncio program in production. Everything in it is built on what we already covered in Part 2; the work here is in pointing the existing machinery at a dashboard.

From Go

Go's runtime ships with most of this for free. runtime/trace, GODEBUG=schedtrace, the scheduler-latency histograms in runtime/metrics — the equivalents of everything below are an env-var away. CPython gives you primitives and a debug flag. The dashboards you want are the ones you build.

§1Why the average lies

The shape of the trap is worth restating because every metric here is designed around it. A single asyncio loop serves all requests on one thread. If one handler does 200 ms of pure-Python work without an await, every other handler waits 200 ms — but only the handlers unlucky enough to be in flight during the freeze see those 200 ms as latency. The rest run through a normal loop and post normal numbers. With realistic traffic and one such freeze per minute, you have a service whose p50 latency is 3 ms, whose p99 is 4 ms, and whose p99.9 is 203 ms. The p99 column on your dashboard does not catch this because p99 is computed over a window of thousands of requests; one bad request per thousand never makes it into the top-1%.

The honest framing is to stop asking "how long did requests take?" and start asking "how long was the loop unavailable?" If the loop was unavailable for 200 ms, that is the bound on every request that wanted to make progress during that window — regardless of whether they ever ran. Loop availability is a property of the scheduler itself, not of any individual handler. We can measure it directly.

§2Loop stall — the metric the loop owes you

The cleanest definition I know is this: schedule a callback to fire every N milliseconds, and record the difference between when it asked to fire and when it actually did. That difference is loop stall. A healthy loop has stalls bounded by timer resolution noise — sub-millisecond on Linux, a couple of ms on macOS. A loop in the cooperative trap has periodic huge stalls. A loop slowly degrading under load has a long tail that grows.

The implementation is fifteen lines:

import asyncio, time

class StallMonitor:
    def __init__(self, interval=0.01, on_stall=None):
        self.interval = interval
        self.on_stall = on_stall  # callback(stall_seconds)
        self._last = None

    async def run(self):
        loop = asyncio.get_running_loop()
        self._last = loop.time()
        while True:
            await asyncio.sleep(self.interval)
            now = loop.time()
            stall = (now - self._last) - self.interval
            if stall > 0 and self.on_stall:
                self.on_stall(stall)
            self._last = now

Fifteen lines. Start it as a Task at program startup; it measures the loop from the inside.

Two subtleties. First, this measures the loop with itself — a fundamentally circular act that nonetheless works, because the canary task is just another callback in _ready. If the loop is frozen, the canary cannot fire; when the freeze ends, the canary fires late and reports the freeze duration. The canary cannot detect a freeze longer than its own runtime, but no asyncio mechanism can, by construction. Second, use loop.time() rather than time.monotonic() — they happen to be the same on Linux, but only the loop's own clock is guaranteed to match the timer wheel.

Wire it to your metrics library and you have the most important graph an asyncio service can have:

from prometheus_client import Histogram

stall_seconds = Histogram(
    "asyncio_loop_stall_seconds",
    "Difference between scheduled and actual fire time of a 10ms canary",
    buckets=(0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5),
)

monitor = StallMonitor(interval=0.01, on_stall=stall_seconds.observe)
asyncio.create_task(monitor.run(), name="stall-monitor")

Diagram A · what stall actually measures

The canary asks to fire every 10 ms. While the loop is free, it fires on time. When a blocking call freezes the loop, the canary fires late — and the delay is exactly the freeze duration.

This single graph is worth more than any p99 chart of request latency. It tells you whether the loop itself is healthy, independent of how much traffic you are serving. Alarm on the upper-tail buckets — anything above 0.05 in normal operation is a problem worth a page.

§3The cheap built-in: debug mode

Before you ship anything custom, ask asyncio itself. The loop has a debug mode that logs callbacks taking longer than a configurable threshold. It costs almost nothing to enable and catches a useful subset of stalls without instrumentation:

# Either of these works:
asyncio.run(main(), debug=True)
# or set PYTHONASYNCIODEBUG=1 in the environment.

# Tune the threshold (default is 100 ms — usually too lenient):
loop = asyncio.get_running_loop()
loop.slow_callback_duration = 0.02     # 20 ms

What this actually does is in Lib/asyncio/base_events.py — every callback invoked from _run_once is timed, and if it exceeds slow_callback_duration, the loop logs a warning at WARNING level with the callback's source location and duration:

Executing <Task pending name='handler-7' coro=<handle() running at app.py:42>
   wait_for=<Future pending>> took 0.187 seconds

Two warnings about the warnings. First, the loop times each callback, not each task. A task that yields cooperatively spends one tick per callback invocation; only the per-tick duration is measured. This is what you want — it catches synchronous CPU work inside an async def, which is precisely the trap from §1. Second, debug mode also enables extra bookkeeping: the loop tracks Task creation sites, slows down some operations to validate them, and reports tasks that are destroyed while still pending. Run it everywhere in development; consider it for staging; in production, lower slow_callback_duration rather than disabling it outright.

Trap

The slow-callback warning fires after the callback has already blocked the loop. It is a postmortem signal, not a prevention. By the time you see the log line, the damage is done. Use it to find the offending code paths in pre-prod; rely on the stall histogram from §2 to alert in prod.

§4The task census

The second class of bug asyncio loves to grow is task leaks. A create_task whose return value is dropped on the floor will run to completion — and if it happens to await something that never resolves (a hung HTTP call, a closed Redis connection, a Future no one will set_result on), it sits in memory holding whatever it captured in its closure forever. With enough traffic and one such leak per request, you have a memory profile that looks like a healthy stairstep until OOM.

The introspection primitive is asyncio.all_tasks(). It returns every task currently alive on the loop. Two things make it usable: naming tasks at creation time, and dumping the set on a signal.

# Always name tasks. The default name is "Task-N" which tells you nothing.
asyncio.create_task(handle(req), name=f"handle:{req.id}")

import signal, sys

def dump_tasks(*_):
    for t in asyncio.all_tasks():
        coro = t.get_coro()
        print(f"{t.get_name():<30} {coro.__qualname__:<30} "
              f"{'done' if t.done() else 'pending'}", file=sys.stderr)

signal.signal(signal.SIGUSR1, dump_tasks)
# kill -USR1 <pid> → instant snapshot of every live task

Naming costs nothing. The SIGUSR1 handler turns a black box into a printable list.

Wire len(asyncio.all_tasks()) to a gauge and alarm on growth that does not match traffic. A task count that climbs monotonically while requests-per-second is flat is a leak — find the unnamed tasks and the call sites that spawned them.

Sidebar

Two messages from the warnings module are worth recognising on sight. RuntimeWarning: coroutine '...' was never awaited means you called an async def and threw the coroutine away — body never ran. Task was destroyed but it is pending means a task you created was garbage-collected while it still had work to do — usually because nobody kept a reference. The first is benign; the second is a leak you got lucky about (the GC reaped the task before it could leak much). Both deserve a fix at the call site.

§5Live introspection — py-spy and aiomonitor

Sometimes the metric tells you something is wrong but the cause is not on any code path you instrumented. For these moments you want to attach to the running process and look around. Two tools do this well, in complementary ways.

py-spy

py-spy reads another process's memory and reconstructs Python stacks without injecting anything. The async-aware mode walks every live coroutine frame:

$ py-spy dump --pid 12345

Thread 12345 (active+gil): "MainThread"
    _run_once (asyncio/base_events.py:1925)
    run_forever (asyncio/base_events.py:607)
    run_until_complete (asyncio/base_events.py:640)

Task: handle:req-abc123  (pending)
    handle (app.py:42)
    fetch_user (db.py:88)
    _wait_for_response (asyncpg/connection.py:1455)

Task: handle:req-def456  (pending)
    handle (app.py:42)
    expensive_parse (parsing.py:114)        # <-- the freeze culprit
    json.loads (json/__init__.py:346)

One snapshot of every task, including the call stack inside the coroutine. Run it during a stall and the culprit's stack tells you what is blocking.

It is also non-invasive — no asyncio cooperation needed, no SDK to install in the target process, no performance impact while not in use. The standard incident-response loop is stall histogram alerts → SSH to host → py-spy dump on the pid → fix at the named function. Add py-spy record for a flamegraph if the cause is distributed across many short calls rather than one long one.

aiomonitor

Where py-spy is read-only and external, aiomonitor is read/write and internal. You instantiate it inside your program and it opens a telnet/HTTP console on a port. From the console you can list tasks, see their stacks, cancel specific tasks, and even drop into an interactive Python shell on the running loop.

import aiomonitor

async def main():
    with aiomonitor.start_monitor(loop=asyncio.get_running_loop(), port=50101):
        await run_app()

# Then, from a shell on the same host:
#   $ telnet localhost 50101
#   monitor >>> ps            # list tasks
#   monitor >>> where 7       # stack of task 7
#   monitor >>> cancel 7      # cancel it
#   monitor >>> console       # drop into a REPL on the loop

It earns its keep for long-running daemons where you want to poke at state without restarting. It pays for itself the first time you cancel a wedged background task without taking the process down. The cost is the surface area: a console on a port is a thing to firewall and authenticate; the README is honest about this. In containerised environments, bind it to localhost and exec into the container.

§6The five metrics worth exporting

If a Prometheus or OpenTelemetry exporter is the right level of abstraction for your stack, the metric set below is the minimum that meaningfully describes an asyncio service. Nothing in it is novel; the value is the small fixed list. Resist adding more until something fails in a way these cannot explain.

metric	type	what it tells you	alarm shape
asyncio_loop_stall_seconds	histogram	The §2 canary. Upper buckets are blocking-call freezes.	p99.9 > 50 ms sustained
asyncio_tasks_total	gauge	`len(all_tasks())`. Growth uncorrelated with RPS is a leak.	derivative > 0 over a long window with flat RPS
asyncio_executor_queue_depth	gauge	How many `run_in_executor` calls are waiting for a worker.	> 0 sustained — pool is undersized
asyncpg_pool_acquire_seconds	histogram	Time spent waiting for a free DB connection.	p99 > one-tenth of your request budget
redis_pool_in_use	gauge	Connections currently checked out. Saturation means waits.	> 90% of pool size sustained

The first one is the canary from §2. The second is one line. The third requires reading into your executor — ThreadPoolExecutor doesn't expose queue depth directly, so wrap it or sample its internal _work_queue.qsize(). The last two are framework-specific but the pattern generalises to any pooled resource: every pool has an acquire time and an in-use count; export both.

Two metrics conspicuously absent: request latency and request count. Export those too — but as application metrics, not asyncio metrics. The point of this set is that it diagnoses the loop independently of what is running on it. If the loop is healthy and requests are slow, the bug is elsewhere (a slow database, a chatty downstream). If the loop is unhealthy, no amount of application-level optimisation will help.

Loop stall is a property of the scheduler, not of any handler. Measure the scheduler.

§7Tracing across `await`

The last piece is distributed tracing, which interacts with asyncio in one specific and important way: the trace context has to survive every await. The answer is contextvars, the threading-local-but-for-coroutines abstraction added in Python 3.7. When the loop schedules a task to run, it captures the current context (with copy_context()) and restores it when the task resumes. This means a ContextVar set in the request handler is visible inside every coroutine that handler awaits, transitively, all the way down — until someone explicitly resets it or the task ends.

OpenTelemetry's Python SDK uses this directly. opentelemetry.context is a thin layer over contextvars.ContextVar, which means trace and span IDs propagate across await for free, without any per-library instrumentation:

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

async def handle(req):
    with tracer.start_as_current_span("handle"):
        user = await fetch_user(req.user_id)    # span propagates here
        return await render(user)            # and here

The one place this breaks is when you cross out of the loop — run_in_executor, a raw thread, a process pool. The worker thread does not share contextvars with the loop thread, so the trace context is lost unless you propagate it yourself. OTel's auto-instrumentation handles this for the common stdlib executor, but if you spawn threads directly, you need contextvars.copy_context().run(fn, ...) on the worker side.

Cross-context drift

The asymmetry is worth internalising. Within the loop, contextvars propagate across await automatically — the loop captures and restores context on every Task switch. Across the loop boundary (executor, subprocess, message queue) you are on your own. Anything you want preserved must be serialised into the work item or copied into the worker thread by hand. Almost every "we lost the trace ID" production mystery is one of these boundary crossings, unguarded.

§8A short list to take with you

Average request latency lies about asyncio. Measure loop stall — the lateness of a 10 ms canary — and graph the upper buckets.
Enable debug=True in dev and staging; lower slow_callback_duration to 20 ms; treat the warnings as a queue of bugs.
Name every task. Wire len(asyncio.all_tasks()) to a gauge. Install a SIGUSR1 handler that dumps the task table.
Keep py-spy on every host. py-spy dump --pid X during a stall almost always names the offender.
Five metrics: stall histogram, task count, executor queue depth, pool acquire times, pool in-use. Everything else is conditional.
contextvars propagate across await for free, across thread/process boundaries for nothing. Trace IDs follow the same rule.

Read alongside

Part 2 — The Loop is the Trick

Every metric above is a thin wrapper over machinery from Part 2: the cooperative trap motivates the stall canary; _run_once is what slow_callback_duration times; the Task spine is what all_tasks() walks. If a metric here makes no sense, the corresponding paragraph in Part 2 will fix it. If a paragraph in Part 2 felt abstract, the metric here is what it looks like in production.