Introducing Retrace:
Record Production Python. Debug It Backwards.
Production bugs are hard because you can’t reproduce them.
The request that triggered the crash is gone. The database state has moved on. The external API returned something unexpected, but now it returns something else. You’re left reconstructing what happened from logs, metrics, and guesswork.
The problem is that those signals only capture what someone predicted would matter. The actual execution ran once and vanished.
Retrace changes that. It records the failing Python execution once, in production, with <0.1% overhead on typical web service workloads. Then you can replay it deterministically in VS Code and do something ordinary Python debuggers can’t: start at the crash and step backwards to the cause.
Today, we’re releasing an open-source preview of Retrace: deterministic record-replay for Python, with local replay debugging in VS Code. We believe it is the first reverse debugger designed for production CPython applications.
Record once. Replay locally. Step backwards from the crash to the cause.
A real example: an API response you can’t reproduce
Imagine a Python service that classifies customer messages by calling an external API. In testing it works. In production, one request occasionally crashes.
In this example the external API is an LLM provider, but the same failure pattern appears with payment APIs, webhooks, queues, databases, flaky internal services, and CI-only data.
def classify_intent(message):
# We ask for JSON, but rare non-JSON responses still occur in production.
response = llm.chat(prompt=f"Classify: {message}")
data = json.loads(response) # <-- JSONDecodeError
return data["intent"]
The crash is obvious:
The hard part is not understanding the exception. The hard part is recovering the exact response and execution state that caused it.
Why normal debugging fails
- The failure is non-deterministic: rerunning locally won’t reproduce the exact response.
- Logs rarely include the full prompt and response because of cost, noise, and PII.
- APM can show the stack trace and timing, but it cannot recreate the external call or the surrounding execution state.
- A live debugger only helps if you can make the same failure happen again.
That is the core problem Retrace solves: you no longer need to reproduce the failure. You already recorded the execution that failed.
With Retrace
With Retrace, you record the failing execution once:
Record the execution:
Then open the recording in VS Code:
# Open recordings/crash.retrace
# Start replay from the Retrace sidebar
The debugger runs the recorded execution, not a live process. You can set breakpoints, inspect variables, step forwards, and step backwards through what already happened.
Now you can inspect the exact response that caused the crash:
Root cause found: the external API returned conversational text instead of JSON. In replay, you can inspect the exact prompt, response, retries, and local state that triggered the failure. Then you can turn the recorded execution into a regression case.
LLMs make nondeterminism obvious, but this is not an AI-only problem. The same pattern appears with flaky APIs, race conditions, CI-only failures, and production-only data.
Why this took six years
Deterministic replay sounds straightforward: capture external inputs, replay them in the same order.
In practice, Python has pervasive nondeterminism that even experienced Python developers don’t usually think about until they try to replay a real production execution.
Threading. The GIL does not make execution order deterministic. Two runs of the same program can execute bytecode in different orders. Retrace preserves the original thread interleaving so replay follows the same schedule.
External libraries. Connection pools, retries, lazy initialization, caches, file handles, database clients, HTTP libraries, and background threads can all introduce behavior that is invisible during normal execution but matters during replay.
Object identity and C extensions. Python programs do not only move through pure Python code. They pass through C extension types, wrapped objects, descriptors, callbacks, and library internals. Retrace has to preserve enough identity and boundary behavior for replay to behave like the original execution.
The observer effect. Debuggers change the programs they observe. Attaching a debugger, importing modules, setting breakpoints, or communicating with the debug client can all perturb the execution. Retrace avoids this by recording first, then debugging the replay later.
Early versions of Retrace tried shallower hooks. That worked for small examples, but broke on real applications. The current system records at the Python boundary, preserves thread ordering, and replays the same Python program with recorded external results.
The result: real Python applications can be recorded with low overhead and replayed locally as deterministic debug sessions. Full benchmark methodology is at docs/performance.md.
How Retrace works
Retrace divides execution into two worlds:
Internal code: your application logic. Given the same inputs, this should behave deterministically.
External code: network calls, database queries, filesystem access, time, randomness, and other sources of nondeterminism.
Retrace does not record every Python instruction in production. Instead, it records the boundary between your deterministic application code and the nondeterministic outside world.
During recording, calls crossing that boundary are intercepted. Retrace records the call, the arguments, and the result or exception.
your code → Retrace boundary → external library/API
↘ result recorded
During replay, the same Python code runs again. But when it reaches an external boundary, Retrace returns the recorded result instead of making the real external call.
your code → Retrace boundary → recorded result → your code
Because the replay receives the same external results, and because thread ordering is preserved, the execution follows the original production run.
The result is a portable recording: no production database, no live API, no credentials, no network calls. Just the execution that already happened, replayed locally.
Step backwards through production crashes
Most debuggers only go forwards. You set a breakpoint, hope it is in the right place, and re-run. With production bugs, there is no re-run. The execution already happened.
Retrace changes the workflow.
You start where normal debugging usually ends: at the crash. Then you walk backwards.
In VS Code, a Retrace replay lets you:
- Step Back: walk backwards one statement at a time.
- Reverse Continue: run backwards to the previous breakpoint.
- Inspect variables: see the values that existed at each point in the recorded execution.
- Move forwards again: step through the replay like a normal debug session.
That means no log correlation, no guessing where the breakpoint should have been, and no trying to recreate production state locally. You debug the execution that actually failed.
Architecture
Retrace has four main components.
1. Proxy system
The proxy system intercepts calls at the internal/external boundary. It dynamically wraps external objects and routes boundary calls through recording or replay logic.
This is what lets Retrace capture Python-level behavior without recording every syscall or instrumenting every line of your application.
2. Stream writer
Recorded calls and results are written to a compact binary trace. The recording path is designed to keep work off the application’s hot path: the application thread records the event and returns, while serialization and persistence happen asynchronously.
3. Thread demultiplexer
Real Python programs use threads, async runtimes, background work, and libraries that introduce scheduling nondeterminism. Retrace records the original interleaving and uses a C++ demultiplexer during replay to ensure each thread runs in the same order.
4. VS Code replay debugger
Retrace includes a custom Debug Adapter Protocol implementation for replay debugging. In enhanced mode, a replay proxy manages multiple replay states so VS Code can step backwards as well as forwards.
The architectural decision is to capture Python semantics, not just syscalls. Retrace records boundary calls, return values, object behavior, and thread ordering at the level Python developers actually debug.
What this is not
Retrace is not an APM tool. It does not sample traces or aggregate metrics.
It is not a logging library. You do not decide in advance which variables might matter.
It is not rr for Python. We are not recording an entire machine process at the syscall level.
Retrace records the boundary between your Python code and the nondeterministic outside world, then replays the same Python code locally with those recorded results.
Try it with the Quickstart
The fastest way to try Retrace is the included Flask quickstart.
The quickstart walks through:
- creating a Python environment,
- installing Retrace,
- recording a Flask execution,
- opening the recording in VS Code,
- setting breakpoints,
- stepping forwards and backwards through the recorded execution.
cd retracesoftware/quickstart
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install retracesoftware
python -m retracesoftware install
python -m pip install -r requirements.txt
RETRACE_RECORDING=recordings/flask.retrace python examples/flask_demo.py
code .
Supported today
The open-source preview currently supports:
- Python: 3.11 and 3.12
- Operating systems: macOS and Linux
- Frameworks: Flask and Django
- HTTP: Requests
- Database: psycopg2 / PostgreSQL
- Core behavior: threading, forking, time, randomness, environment variables, file I/O
- Debugger: VS Code replay debugging
If your stack isn't covered yet, open an issue. We are expanding support based on what real users need.
Get involved
GitHub – star the repo, read the source, or contribute
Docs – follow the setup guides and supported-environment notes.
Issues – report bugs or request library support.
Discussions – ask questions, share use cases, or tell us what failed.
We have been working on this problem for six years. We are excited to finally put it in developers’ hands.
Questions? Email hello@retracesoftware.com.