14 January, 2026

Introducing Retrace:
Deterministic Record-Replay for Python

Production bugs are maddening because you can't reproduce them. The request that triggered a crash is gone. The database state has moved on. The external API that returned something unexpected is now returning something else. You're left reconstructing what happened from logs, metrics, and guesswork.

The problem is that these signals only capture what someone predicted would matter. The actual execution, i.e. the ground truth, is ephemeral. It ran once and vanished.

Retrace changes that. Record a Python execution in production, replay it deterministically on your laptop, and debug it in VSCode as if it were happening live. Same inputs, same outputs, same execution path, every time.

Today, we're releasing an open-source preview so you can try it yourself.

A real example: AI agent crashes

Your AI application classifies customer intents using an LLM. It works perfectly in testing, but crashes occasionally in production:

Python
# Your code
def classify_intent(message):
    response = llm.chat(prompt=f"Classify: {message}")
    data = json.loads(response) # <-- JSONDecodeError
    return data["intent"]
The crash:
None
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Why you can't debug it:
  • LLM responses are non-deterministic (different every time)
  • You can't reproduce the exact failure
  • Logs show the error, not what the LLM actually returned
  • APM shows the stack trace, not the context
With Retrace:

    1. Record the execution:

Shell
RETRACE=1 RETRACE_RECORDING_PATH=crash python app.py

    2. Replay in VS Code:

Shell
code crash/replay.code-workspace # Set breakpoint, press F5

    3. Inspect the exact LLM response:

None
response = "Sure! The intent is billing with high confidence."

Root cause found: The LLM returned conversational text instead of JSON. You can see the exact response that crashed your app, even though the LLM is non-deterministic.

Try the full AI observability demo →

Why this took six years?

Deterministic replay sounds straightforward: capture external inputs, replay them in the same order. In practice, Python has nondeterminism hiding everywhere.

Threading. Even with the GIL, thread interleaving is nondeterministic. Two runs of the same code can execute bytecode in different orders.

Dictionary iteration. Before Python 3.7, dict iteration order was explicitly undefined. After 3.7, insertion order is preserved, but only for operations Python controls. C extensions and hash collisions can still cause surprises.

Library internals. Many libraries have hidden nondeterminism: connection pooling, retry jitter, lazy initialization, caching keyed by object id. These are invisible until replay diverges.

The observer effect. Debugging itself changes execution. Attaching a debugger, setting breakpoints, even importing certain modules can alter timing and behavior.

Early versions of Retrace tried to solve this with shallow hooks that intercepted at the function-call level. That worked for demos but broke on real code. We spent years going deeper: a custom proxy system that captures at the internal/external boundary, a C++ demultiplexer for thread ordering, and ultimately a complete Python bytecode interpreter for analysis.

The result: Retrace can record and replay real Python applications — Flask, Django, Requests—with ~1% overhead in production.

How it works?

Retrace divides your code into two worlds:

Internal code is your application logic. It's deterministic given the same inputs.

External code is everything else: network calls, database queries, filesystem access, time, and randomness.

During recording, Retrace proxies the boundary between these worlds. Every call to external code is intercepted, and both the arguments and results are serialized to a trace file. Your code runs normally; the trace is a side effect.

During replay, the same proxies are active, but instead of making real external calls, they return the recorded results. Your internal code executes identically because it receives identical inputs.

None
Recording:
Your code → [proxy intercepts] → External library → [result recorded] → Your code

Replay:
Your code → [proxy intercepts] → Recorded result → Your code

Threading is handled by a C++ demultiplexer that tracks the original thread interleaving and blocks threads during replay until it's their turn to execute. This reproduces the exact execution order without requiring lock instrumentation.

The result: production executions become portable artifacts you can replay anywhere—no network, no database, no credentials needed.

Try it.

Prerequisites: macOS or Linux, Python 3.11, VSCode

Install:
Shell
python -m pip install --upgrade pip python -m pip install --upgrade retracesoftware.proxy requests python -m retracesoftware.autoenable
Record:
Shell
# Run your application with recording enabled RETRACE=1 RETRACE_RECORDING_PATH=recording python your_app.py

This creates a recording/ directory containing everything needed to replay the execution.

Replay:
Shell
# CLI replay
cd recording/run
python -m retracesoftware --recording ..

# Or debug in VS Code
code recording/replay.code-workspace
# Set breakpoints, press F5

Step through the code. You'll see the same values, the same responses, the same execution—no network calls are being made. The trace contains everything.

Try a complete demo:

Supported libraries.

The open-source preview supports:

  • HTTP: Requests
  • Web frameworks: Flask, Django
  • Database: psycopg2 (PostgreSQL)
  • Core: threading, time, random, os.environ

We're expanding coverage based on what users need. If your stack isn't covered, open an issue.

Why most libraries "just work": Retrace operates at Python call boundaries, not inside library internals. C extensions don't need instrumentation—as long as they're called via Python functions and return Python types, Retrace can record and replay them. See Supported Environments for details.

What's next?

Record-replay is the foundation. The bigger opportunity is answering harder questions: Where did this value come from? How did it get into this state? What was the chain of transformations?

We're building a provenance engine on top of the replay substrate. This will allow you to trace any value back through the entire lineage: select a variable in the debugger and jump to exactly where it was created, what inputs produced it, and how it propagated through your code.

This is "execution intelligence": not just reproducing what happened, but explaining why.

The provenance engine will be a commercial product. Record-replay is open source and will stay that way.

Get involved.

We've been working on this for six years. We're excited to finally put it in your hands.

Retrace was invented and built by Nathan Matthews and the Retrace team. We're backed by Preston-Werner Ventures.

Questions? Contact Henry Yates, CEO: henry@retracesoftware.com