Python Concurrency · Part 3 of 7 ●●●○○○○

Asyncio Sidequests

The pieces of asyncio that don't fit the clean scheduler story but matter in production: self-pipes, signals, debug mode, the error messages, the policy mess, contextvars, and uvloop.

~14 min read Python 3.11+ Assumes: Part 2

Part 2 left a thread loose. We argued that the loop can sleep forever in epoll_wait because, in a strictly single-threaded program with no signal handlers and no timers, nothing else can produce work. Real programs have all three. A worker in run_in_executor finishes its job on another thread; Ctrl-C sends a signal that has to be handled on the loop thread; arbitrary code calls loop.call_soon_threadsafe from wherever. Each of these needs a way to wake a loop that is currently parked in a syscall with no Python code running. This is the mechanism that does it — plus the half-dozen other production details that don't fit the clean Part 2 narrative but earn their own paragraphs.

Treat this as an appendix you can dip into. Each section is independent. Read the ones that catch your eye; ignore the rest until they bite you.

§1The self-pipe wakes the loop from outside

The argument from Part 2 §4 has a hole, and the hole is exactly the size of every useful program. We claimed that when both queues are empty and the selector blocks forever, sleeping is safe because nothing else can produce work. That holds for a strictly single-threaded loop with no signals. A real program has at least three external sources of work the loop must notice: executor threads finishing, Unix signals, and explicit call_soon_threadsafe from other threads. None of these run on the loop thread; none of them appear as fd readiness events on the fds the loop is watching. If the loop is parked in epoll_wait, none of them can make it return.

The fix predates asyncio, predates Python — it is the self-pipe trick, a Unix idiom that I first saw attributed to D. J. Bernstein in the late 90s. At loop startup, the loop calls os.pipe() (or socketpair on systems where pipes have rougher semantics) and registers the read end with its own selector. The loop now watches one fd it owns itself. To wake the loop from outside, you write a single byte to the write end. The kernel marks the read end readable; epoll_wait returns with that fd in the ready list; the loop drains the byte and notices it has new work in its in-memory queues. Cost per wakeup: one write syscall, one byte of kernel memory, one read to drain. CPython sets this up in Lib/asyncio/selector_events.py::BaseSelectorEventLoop._make_self_pipe; the read callback just discards whatever bytes are sitting there.

Diagram A · waking the loop from outside

The loop is asleep in epoll_wait. A worker thread finishes, calls call_soon_threadsafe, which writes one byte to the eventfd the loop is registered to read. The kernel marks the fd readable; epoll_wait returns; the loop drains the byte and runs the queued callback on its own thread.

With this in place, call_soon_threadsafe(cb) is almost trivial: take a lock, append the Handle to _ready, write a byte to the self-pipe, release the lock. The write is the load-bearing line. Without it, the deque grows and the loop sleeps through the addition. With it, the loop wakes within microseconds of the cross-thread call, sees _ready is non-empty, runs the callback on its own thread. The non-threadsafe call_soon skips the write entirely — it assumes the caller is already on the loop thread and that whatever path got them here will return to the loop naturally. It also skips the lock. Calling call_soon from another thread is not just unsafe in the deque-mutation sense; even if the append happened to be atomic, the loop wouldn't notice it had happened.

On Linux the same job can be done with eventfd (Linux 2.6.22+, 2007), which is strictly nicer: one fd instead of two, an atomic 64-bit counter instead of a byte stream, no risk of the pipe buffer filling under pathological wakeup floods. CPython's stdlib doesn't reach for it directly — the self-pipe is portable to BSD and macOS and the difference doesn't show up in any realistic workload — but uvloop (which is libuv underneath) does. If you see eventfd in an asyncio program's strace, that's why.

The contract

Every line of asyncio code assumes it runs on the loop thread — no locks around the deque, no atomicity around Task state, no thread-local protection on contextvars. The self-pipe is not an optimisation; it is the contract. Everything that touches loop state must be funnelled back through the loop thread, and the self-pipe is the funnel.

§2Signals piggyback on the same wire

The same mechanism solves a problem that has nothing obviously to do with threading: how does Python deliver a signal to an asyncio program? The C-level signal handler runs at an arbitrary instruction boundary, possibly inside a refcount update, possibly inside a malloc. The interpreter is not reentrant. The handler cannot run Python code. So what does it do?

It writes one byte to a file descriptor. signal.set_wakeup_fd(fd) is the public knob — it tells CPython "when any signal arrives, write a byte to this fd before returning from the handler." asyncio sets that fd to the write end of its own self-pipe. The C handler does its single write() and returns. The loop wakes from epoll_wait, sees the self-pipe is readable, drains the byte, and dispatches the Python-level signal handler synchronously on its own thread.

import asyncio, signal

async def main():
    loop = asyncio.get_running_loop()
    loop.add_signal_handler(signal.SIGINT, lambda: print("caught on loop thread"))
    await asyncio.sleep(3600)         # Ctrl-C still responsive

asyncio.run(main())

add_signal_handler installs a Python callback for the signal and ensures the wakeup fd is wired through the self-pipe.

The contrast with threaded code is sharper than it looks. signal.signal(SIGINT, handler) in a threaded program runs the handler on whatever thread the OS happened to interrupt — almost always the main thread, but if you have explicitly disabled main-thread signal delivery, it could be elsewhere. In asyncio, add_signal_handler guarantees the callback runs on the loop thread. This matters because the handler will likely want to touch loop state — cancel tasks, set events, schedule shutdown — and that state is not thread-safe. The self-pipe makes the guarantee free.

This is also why Ctrl-C in an asyncio program is so responsive. The loop is never more than one epoll_wait return away from noticing. The handler scribbles a byte, the kernel marks the fd readable, the syscall returns, and the loop dispatches. No polling, no checking, no PyErr_CheckSignals hot path. Just one byte through a pipe and the scheduler picks it up.

§3Debug mode is a bundle of checks, not a setting

Setting PYTHONASYNCIODEBUG=1 or calling asyncio.run(main(), debug=True) is often described as "turn on extra warnings." That undersells it. Debug mode flips on roughly seven distinct mechanisms inside the loop, each of which catches a different category of bug. Knowing what each one does lets you decide whether the cost of running with debug on is acceptable for your environment.

The full inventory, from Lib/asyncio/base_events.py and friends:

# 1. Slow-callback timing — each callback is timed; anything over
#    slow_callback_duration (default 100 ms) logs a WARNING with source.

# 2. Coroutine origin tracking — sys.set_coroutine_origin_tracking_depth(n)
#    records the call stack at the point each coroutine was created. The
#    'never awaited' warning then includes the creation traceback.

# 3. Resource warning escalation — un-awaited coroutines and pending tasks
#    that get garbage-collected log loudly, with their origin.

# 4. Cross-thread call detection — call_soon checks that it is running
#    on the loop thread; raises if not. (Without debug, this is silent
#    corruption.) This is the single most valuable check.

# 5. Future/Task strict checks — set_result on a non-pending Future raises
#    with a clearer message; Tasks created without a running loop are
#    rejected up front.

# 6. Selector slow-path warnings — if epoll_wait returns with fds that
#    have no registered handler, the loop logs (usually a sign of stale
#    transports).

# 7. SSL handshake timeouts — SSL operations get explicit deadlines so
#    hung TLS handshakes surface instead of parking forever.

Number four is the one that earns its keep in development. Without debug mode, calling loop.call_soon from a worker thread silently appends to the deque and then waits for someone to wake the loop — which nobody will, because you forgot to use the threadsafe variant. The bug manifests as "my callback never ran" hours later when traffic shifts. With debug mode, the same call raises immediately at the offending line. The check is one threading.get_ident() comparison and it has caught more production bugs in my experience than any other asyncio feature.

The cost of running with debug on in production is the slow-callback timing — every callback pays for a perf_counter() call on entry and exit. On a busy server processing tens of thousands of callbacks per second, this is measurable but not catastrophic. Recommendation: debug everywhere in development; debug with slow_callback_duration = 0.02 in staging; in production, either debug-on with a higher threshold or debug-off with the canary monitor from the measurement post doing the equivalent work.

§4Six errors you will see, and what they mean

The asyncio surface has a small, finite set of error and warning messages that experienced users learn to read at a glance. New users see them and panic. They all have specific, mechanical causes — and reading them back to the mechanism they reveal is the fastest way to internalise how the runtime actually works.

message	what really happened	fix
coroutine '…' was never awaited	You called an `async def` function and threw the coroutine object away. Body never ran.	`await` it, or wrap with `asyncio.create_task` and keep a reference.
Task was destroyed but it is pending	A Task you created was garbage-collected while still suspended. You did not keep a reference, and the loop's strong reference is not enough on its own.	Hold the Task in a set you clear on completion. Or use `TaskGroup`, which holds for you.
exception was never retrieved	A Task failed, and nothing ever awaited it or called `.result()`. The exception was silently dropped.	Always `await` tasks you create, or attach `add_done_callback` that inspects `.exception()`.
got Future <…> attached to a different loop	You passed an awaitable between event loops. Each loop has its own selector and its own self-pipe; Futures are not portable.	Use `asyncio.run_coroutine_threadsafe` if you need to schedule into another loop from a thread.
This event loop is already running	You called `loop.run_until_complete` from inside an already-running loop. The runtime refuses to nest.	Use `await` if you're already in a coroutine, or `create_task` for fire-and-forget. Never nest `asyncio.run`.
There is no current event loop in thread '…'	You called `asyncio.get_event_loop()` in a non-main thread (3.10+) or anywhere (3.12+). The implicit-loop behaviour is gone.	Use `asyncio.run(main())` at the entrypoint. Inside coroutines, use `get_running_loop()`.

Pattern: every one of these is a contract violation made visible. The framework is not punishing you; it is reporting a state it was not designed to handle, in the only place it can — at the point where the violation becomes externally observable. The "was never awaited" warning fires from the GC because that's the only place CPython can notice you dropped the coroutine. The "different loop" error fires from add_done_callback because that's the first operation that can't be made coherent across two selectors. Once you read them as "the framework is telling you when its invariants broke," they stop being scary.

§5`asyncio.run()` is the only entrypoint that still works

asyncio shipped in 3.4 with a sprawling API for managing event loops. Through the 3.x lifecycle most of it has been deprecated, narrowed, or quietly removed. The historical archaeology is occasionally useful — old codebases still use the deprecated forms, and the deprecation arc is informative — but for new code the recommendation collapses to a single function.

The story in versions:

3.4–3.6 — asyncio.get_event_loop() would create a loop if none existed in the current thread. You wrote loop = get_event_loop(); loop.run_until_complete(main()). This implicit creation was the original sin: everything that touched a loop had a hidden side effect.
3.7 — asyncio.run() introduced. Creates a loop, runs the coroutine, closes the loop, drains pending tasks, all in one call. Recommended entrypoint from this version on.
3.10 — get_event_loop() emits DeprecationWarning when it would create a loop. Existing loops still returned silently.
3.12 — get_event_loop() raises RuntimeError if no loop is running. The implicit-create behaviour is gone.
3.14 — event loop policies (asyncio.AbstractEventLoopPolicy and friends) deprecated. The whole policy hierarchy was a pluggability point that almost no one used; uvloop was the one real customer, and it now installs itself differently.

The current right answer is one line at the program entrypoint and one line inside coroutines:

async def main():
    loop = asyncio.get_running_loop()  # inside a coroutine — always works
    ...

asyncio.run(main())                  # at the entrypoint — creates and closes the loop

Trap

If you read older asyncio tutorials, almost every example will use asyncio.get_event_loop() at module level. In 3.12+ this is an error in any thread that doesn't already have a running loop. If you maintain a codebase pinned to a 3.10 baseline, do a grep — those calls will start failing the moment you bump the interpreter.

§6Contextvars are loop-aware on purpose

The contextvars module is one of the few features added to Python specifically because asyncio existed. Threading-local storage doesn't work for coroutines — many coroutines share one thread, and a threading.local variable set in one task would leak into all of them. The fix, in 3.7, was a different abstraction: a Context object that wraps a snapshot of all ContextVar values, with explicit get/set semantics, and a loop that knows to swap contexts on every Task step.

The mechanism is built into the Task itself. When you call asyncio.create_task(coro), the Task constructor calls contextvars.copy_context() to snapshot the current context, and stores it. Every subsequent _step on that Task runs the coroutine inside that captured context via Context.run(). The result: a ContextVar set inside a request handler is visible to every coroutine that handler awaits, transitively, all the way down — but invisible to sibling tasks running concurrently on the same loop.

import contextvars, asyncio

request_id: contextvars.ContextVar[str] = contextvars.ContextVar("req")

async def inner():
    print("inner sees:", request_id.get())

async def handle(rid):
    request_id.set(rid)
    await inner()                  # sees rid

async def main():
    await asyncio.gather(
        handle("a"),                # prints "inner sees: a"
        handle("b"),                # prints "inner sees: b"
    )                              # neither contaminates the other

asyncio.run(main())

Each gathered task gets its own captured context. The set() in one doesn't leak to the other.

The asymmetry worth internalising: contextvars propagate within the loop for free, and across the loop boundary for nothing. run_in_executor drops you into a thread pool worker that doesn't share contextvars with the loop. Anything you wanted preserved across that boundary has to be passed explicitly, or copied with copy_context().run(fn, *args) on the worker side. OpenTelemetry's auto-instrumentation handles this for the standard executor; hand-rolled threading does not. Most "we lost the trace ID" production mysteries are this boundary crossing, unguarded.

From Go

context.Context in Go is the same idea with the opposite ergonomics. Go threads explicitly through every function signature: you must accept and forward ctx. Python's contextvars are ambient: any function can read them, no signature change required. The Go version is more visible at call sites and harder to lose; the Python version is invisible and survives third-party libraries that don't know about your context. Both ship the same essential thing — request-scoped state that flows with the work, not with the thread.

§7uvloop is the same scheduler, faster

uvloop is a drop-in replacement for asyncio's event loop, written in Cython on top of libuv (the same C library Node.js uses). It implements the same public API — same create_task, same TaskGroup, same await — and benchmarks at roughly two to four times the throughput of the stdlib loop for I/O-heavy workloads. Installing it is two lines:

import uvloop
uvloop.install()                # before asyncio.run

# In Python 3.12+ the recommended form is:
import asyncio, uvloop
asyncio.run(main(), loop_factory=uvloop.EventLoopPolicy().new_event_loop)

What you gain is performance. What you lose is small but worth knowing:

Selector module is bypassed. libuv uses epoll/kqueue directly; you cannot register fds with the stdlib selectors module and expect them to work. Most application code doesn't care.
No ProactorEventLoop on Windows. uvloop is Unix-only. Cross-platform projects either skip it on Windows or use a conditional install.
Subprocess semantics differ slightly. libuv's child-process handling is its own implementation; some edge cases around signal forwarding behave differently from stdlib.
Debug mode is less informative. The Cython internals don't surface the slow-callback warnings or origin tracking the way base_events.py does. If you rely on debug mode for development, stick to stdlib in dev and switch to uvloop in prod.

The mental model: every concept from Part 2 still holds. There is still one thread, one selector (libuv's), one _run_once-equivalent. Tasks still wrap coroutines, futures still resolve, the self-pipe still exists (as an eventfd this time). The difference is that all of it is in C — the inner loop doesn't allocate Python objects per callback, and the Task spine isn't paying interpreter overhead on every send. uvicorn enables it by default for exactly this reason; if you have measured your asyncio service and found that asyncio overhead is genuinely the bottleneck, uvloop is the cheapest win available.

uvloop is not a different model. It is the same model with the interpreter taken out of the hot path.

§8A short list to take with you

The self-pipe is how the loop is woken from outside its own thread. Every cross-context primitive — executors, signals, call_soon_threadsafe — is a variant of writing one byte to it.
Signals deliver via set_wakeup_fd — the C handler writes one byte; the loop wakes and dispatches the Python handler synchronously on its own thread.
Debug mode is seven distinct checks bundled behind one flag. The cross-thread call_soon detector alone justifies running with it everywhere in dev.
asyncio errors are mechanism reports. "Was never awaited" is the GC noticing; "different loop" is a Future failing to attach. Read them backwards into the broken invariant.
asyncio.run() is the only entrypoint. get_event_loop() is a deprecated artefact of 3.4. Inside coroutines, get_running_loop().
Contextvars propagate across await for free, across thread/process boundaries for nothing. Copy explicitly when crossing.
uvloop is the same scheduler, written in C. If asyncio is your bottleneck, it is the cheapest win.

Up next · Part 4 of 7

The Two-Way Door

You've seen the loop drive coroutines, but never had to ask: what is a coroutine, mechanically? Part 4 goes down one level — generators, send, throw, yield from, and the desugaring that turns await into a yield from the iterator returned by __await__. With a REPL at your elbow and a complete forty-line asyncio you can step through under pdb. After this, the discipline layer (Part 5) and the cancellation trap (Part 6) stop looking like magic.