May 12th, 2026

Introducing Retrace:
Record Production Python. Debug It Backwards.

Production bugs are hard because you can’t reproduce them.

The request that triggered the crash is gone. The database state has moved on. The external API returned something unexpected, but now it returns something else. You’re left reconstructing what happened from logs, metrics, and guesswork.

The problem is that those signals only capture what someone predicted would matter. The actual execution ran once and vanished.

Retrace changes that. It records the failing Python execution once, in production, with <0.1% overhead on typical web service workloads. Then you can replay it deterministically in VS Code and do something ordinary Python debuggers can’t: start at the crash and step backwards to the cause.

Today, we’re releasing an open-source preview of Retrace: deterministic record-replay for Python, with local replay debugging in VS Code. We believe it is the first reverse debugger designed for production CPython applications.

Record once. Replay locally. Step backwards from the crash to the cause.

A real example: an API response you can’t reproduce

Imagine a Python service that classifies customer messages by calling an external API. In testing it works. In production, one request occasionally crashes.

In this example the external API is an LLM provider, but the same failure pattern appears with payment APIs, webhooks, queues, databases, flaky internal services, and CI-only data.

Python
# Your code
def classify_intent(message):
    # We ask for JSON, but rare non-JSON responses still occur in production.
    response = llm.chat(prompt=f"Classify: {message}")
    data = json.loads(response) # <-- JSONDecodeError
    return data["intent"]
The crash is obvious:
text
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The hard part is not understanding the exception. The hard part is recovering the exact response and execution state that caused it.

Why normal debugging fails
  • The failure is non-deterministic: rerunning locally won’t reproduce the exact response.
  • Logs rarely include the full prompt and response because of cost, noise, and PII.
  • APM can show the stack trace and timing, but it cannot recreate the external call or the surrounding execution state.
  • A live debugger only helps if you can make the same failure happen again.

That is the core problem Retrace solves: you no longer need to reproduce the failure. You already recorded the execution that failed.

With Retrace

With Retrace, you record the failing execution once:

    Record the execution:

bash
RETRACE_RECORDING=recordings/crash.retrace python app.py

    Then open the recording in VS Code:

bash
code .
# Open recordings/crash.retrace
# Start replay from the Retrace sidebar

The debugger runs the recorded execution, not a live process. You can set breakpoints, inspect variables, step forwards, and step backwards through what already happened.

Now you can inspect the exact response that caused the crash:

python
response = "Sure! The intent is billing with high confidence."

Root cause found: the external API returned conversational text instead of JSON. In replay, you can inspect the exact prompt, response, retries, and local state that triggered the failure. Then you can turn the recorded execution into a regression case.

LLMs make nondeterminism obvious, but this is not an AI-only problem. The same pattern appears with flaky APIs, race conditions, CI-only failures, and production-only data.

Why this took six years

Deterministic replay sounds straightforward: capture external inputs, replay them in the same order.

In practice, Python has pervasive nondeterminism that even experienced Python developers don’t usually think about until they try to replay a real production execution.

Threading. The GIL does not make execution order deterministic. Two runs of the same program can execute bytecode in different orders. Retrace preserves the original thread interleaving so replay follows the same schedule.

External libraries. Connection pools, retries, lazy initialization, caches, file handles, database clients, HTTP libraries, and background threads can all introduce behavior that is invisible during normal execution but matters during replay.

Object identity and C extensions. Python programs do not only move through pure Python code. They pass through C extension types, wrapped objects, descriptors, callbacks, and library internals. Retrace has to preserve enough identity and boundary behavior for replay to behave like the original execution.

The observer effect. Debuggers change the programs they observe. Attaching a debugger, importing modules, setting breakpoints, or communicating with the debug client can all perturb the execution. Retrace avoids this by recording first, then debugging the replay later.

Early versions of Retrace tried shallower hooks. That worked for small examples, but broke on real applications. The current system records at the Python boundary, preserves thread ordering, and replays the same Python program with recorded external results.

The result: real Python applications can be recorded with low overhead and replayed locally as deterministic debug sessions. Full benchmark methodology is at docs/performance.md.

How Retrace works

Retrace divides execution into two worlds:

Internal code: your application logic. Given the same inputs, this should behave deterministically.

External code: network calls, database queries, filesystem access, time, randomness, and other sources of nondeterminism.

Retrace does not record every Python instruction in production. Instead, it records the boundary between your deterministic application code and the nondeterministic outside world.

During recording, calls crossing that boundary are intercepted. Retrace records the call, the arguments, and the result or exception.

text
Recording:
your code → Retrace boundary → external library/API
          ↘ result recorded

During replay, the same Python code runs again. But when it reaches an external boundary, Retrace returns the recorded result instead of making the real external call.

text
Replay:
your code → Retrace boundary → recorded result → your code

Because the replay receives the same external results, and because thread ordering is preserved, the execution follows the original production run.

The result is a portable recording: no production database, no live API, no credentials, no network calls. Just the execution that already happened, replayed locally.

Step backwards through production crashes

Most debuggers only go forwards. You set a breakpoint, hope it is in the right place, and re-run. With production bugs, there is no re-run. The execution already happened.

Retrace changes the workflow.

You start where normal debugging usually ends: at the crash. Then you walk backwards.

In VS Code, a Retrace replay lets you:

  • Step Back: walk backwards one statement at a time.
  • Reverse Continue: run backwards to the previous breakpoint.
  • Inspect variables: see the values that existed at each point in the recorded execution.
  • Move forwards again: step through the replay like a normal debug session.

That means no log correlation, no guessing where the breakpoint should have been, and no trying to recreate production state locally. You debug the execution that actually failed.

Architecture

Retrace has four main components.

1. Proxy system

The proxy system intercepts calls at the internal/external boundary. It dynamically wraps external objects and routes boundary calls through recording or replay logic.

This is what lets Retrace capture Python-level behavior without recording every syscall or instrumenting every line of your application.

2. Stream writer

Recorded calls and results are written to a compact binary trace. The recording path is designed to keep work off the application’s hot path: the application thread records the event and returns, while serialization and persistence happen asynchronously.

3. Thread demultiplexer

Real Python programs use threads, async runtimes, background work, and libraries that introduce scheduling nondeterminism. Retrace records the original interleaving and uses a C++ demultiplexer during replay to ensure each thread runs in the same order.

4. VS Code replay debugger

Retrace includes a custom Debug Adapter Protocol implementation for replay debugging. In enhanced mode, a replay proxy manages multiple replay states so VS Code can step backwards as well as forwards.

The architectural decision is to capture Python semantics, not just syscalls. Retrace records boundary calls, return values, object behavior, and thread ordering at the level Python developers actually debug.

What this is not

Retrace is not an APM tool. It does not sample traces or aggregate metrics.

It is not a logging library. You do not decide in advance which variables might matter.

It is not rr for Python. We are not recording an entire machine process at the syscall level.

Retrace records the boundary between your Python code and the nondeterministic outside world, then replays the same Python code locally with those recorded results.

Try it with the Quickstart

The fastest way to try Retrace is the included Flask quickstart.

The quickstart walks through:

  1. creating a Python environment,
  2. installing Retrace,
  3. recording a Flask execution,
  4. opening the recording in VS Code,
  5. setting breakpoints,
  6. stepping forwards and backwards through the recorded execution.
Start the Quickstart →

 

bash
git clone https://github.com/retracesoftware/retracesoftware.git
cd retracesoftware/quickstart

python3.12 -m venv .venv
source .venv/bin/activate

python -m pip install --upgrade pip
python -m pip install retracesoftware
python -m retracesoftware install
python -m pip install -r requirements.txt

RETRACE_RECORDING=recordings/flask.retrace python examples/flask_demo.py
code .

Supported today

The open-source preview currently supports:

  • Python: 3.11 and 3.12
  • Operating systems: macOS and Linux
  • Frameworks: Flask and Django
  • HTTP: Requests
  • Database: psycopg2 / PostgreSQL
  • Core behavior: threading, forking, time, randomness, environment variables, file I/O
  • Debugger: VS Code replay debugging

If your stack isn't covered yet, open an issue. We are expanding support based on what real users need.

Get involved

GitHub – star the repo, read the source, or contribute

Docs – follow the setup guides and supported-environment notes.

Issues – report bugs or request library support.

Discussions – ask questions, share use cases, or tell us what failed.

We have been working on this problem for six years. We are excited to finally put it in developers’ hands.

Questions? Email hello@retracesoftware.com.