14 January, 2026

Introducing Retrace:
Deterministic Record-Replay for Python

Production bugs are maddening because you can't reproduce them. The request that triggered a crash is gone. The database state has moved on. The external API that returned something unexpected is now returning something else. You're left reconstructing what happened from logs, metrics, and guesswork.

The problem is that these signals only capture what someone predicted would matter. The actual execution, i.e. the ground truth, is ephemeral. It ran once and vanished.

Retrace changes that. Record a Python execution in production, replay it deterministically on your laptop, and debug it in VSCode as if it were happening live. Same inputs, same outputs, same execution path, every time.

Today, we're releasing an open-source preview so you can try it yourself.

A real example: AI agent crashes

Your AI application classifies customer intents using an LLM. It works perfectly in testing, but crashes occasionally in production:

Python

Copy code

				# Your code

				def classify_intent(message):

    response = llm.chat(prompt=f"Classify: {message}")

    data = json.loads(response)  # <-- JSONDecodeError

    return data["intent"]

The crash:

None

Copy code

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Why you can't debug it:

LLM responses are non-deterministic (different every time)
You can't reproduce the exact failure
Logs show the error, not what the LLM actually returned
APM shows the stack trace, not the context

With Retrace:

1. Record the execution:

Shell

Copy code

RETRACE=1 RETRACE_RECORDING_PATH=crash python app.py

2. Replay in VS Code:

Shell

Copy code

				code crash/replay.code-workspace
				# Set breakpoint, press F5
			

3. Inspect the exact LLM response:

None

Copy code

response = "Sure! The intent is billing with high confidence."

Root cause found: The LLM returned conversational text instead of JSON. You can see the exact response that crashed your app, even though the LLM is non-deterministic.

Try the full AI observability demo →

Why this took six years?

Deterministic replay sounds straightforward: capture external inputs, replay them in the same order. In practice, Python has nondeterminism hiding everywhere.

Threading. Even with the GIL, thread interleaving is nondeterministic. Two runs of the same code can execute bytecode in different orders.

Dictionary iteration. Before Python 3.7, dict iteration order was explicitly undefined. After 3.7, insertion order is preserved, but only for operations Python controls. C extensions and hash collisions can still cause surprises.

Library internals. Many libraries have hidden nondeterminism: connection pooling, retry jitter, lazy initialization, caching keyed by object id. These are invisible until replay diverges.

The observer effect. Debugging itself changes execution. Attaching a debugger, setting breakpoints, even importing certain modules can alter timing and behavior.

Early versions of Retrace tried to solve this with shallow hooks that intercepted at the function-call level. That worked for demos but broke on real code. We spent years going deeper: a custom proxy system that captures at the internal/external boundary, a C++ demultiplexer for thread ordering, and ultimately a complete Python bytecode interpreter for analysis.

The result: Retrace can record and replay real Python applications — Flask, Django, Requests—with ~1% overhead in production.

How it works?

Retrace divides your code into two worlds:

Internal code is your application logic. It's deterministic given the same inputs.

External code is everything else: network calls, database queries, filesystem access, time, and randomness.

During recording, Retrace proxies the boundary between these worlds. Every call to external code is intercepted, and both the arguments and results are serialized to a trace file. Your code runs normally; the trace is a side effect.

During replay, the same proxies are active, but instead of making real external calls, they return the recorded results. Your internal code executes identically because it receives identical inputs.

None

Copy code

				Recording:

				Your code → [proxy intercepts] → External library → [result recorded] → Your code

				Replay:

				Your code → [proxy intercepts] → Recorded result → Your code

Threading is handled by a C++ demultiplexer that tracks the original thread interleaving and blocks threads during replay until it's their turn to execute. This reproduces the exact execution order without requiring lock instrumentation.

The result: production executions become portable artifacts you can replay anywhere—no network, no database, no credentials needed.

Try it.

Prerequisites: macOS or Linux, Python 3.11, VSCode

Install:

Shell

Copy code

						python -m pip install --upgrade pip
						python -m pip install --upgrade retracesoftware.proxy requests
						python -m retracesoftware.autoenable
					

Record:

Shell

Copy code

						# Run your application with recording enabled
						RETRACE=1 RETRACE_RECORDING_PATH=recording python your_app.py
					

This creates a recording/ directory containing everything needed to replay the execution.

Replay:

Shell

Copy code

				# CLI replay

				cd recording/run

				python -m retracesoftware --recording ..

				# Or debug in VS Code

				code recording/replay.code-workspace

				# Set breakpoints, press F5

Step through the code. You'll see the same values, the same responses, the same execution—no network calls are being made. The trace contains everything.

Try a complete demo:

Quickstart: Flask crash replay (10 minutes)
AI observability: Debug LLM failures (15 minutes)

Supported libraries.

The open-source preview supports:

HTTP: Requests
Web frameworks: Flask, Django
Database: psycopg2 (PostgreSQL)
Core: threading, time, random, os.environ

We're expanding coverage based on what users need. If your stack isn't covered, open an issue.

Why most libraries "just work": Retrace operates at Python call boundaries, not inside library internals. C extensions don't need instrumentation—as long as they're called via Python functions and return Python types, Retrace can record and replay them. See Supported Environments for details.

What's next?

Record-replay is the foundation. The bigger opportunity is answering harder questions: Where did this value come from? How did it get into this state? What was the chain of transformations?

We're building a provenance engine on top of the replay substrate. This will allow you to trace any value back through the entire lineage: select a variable in the debugger and jump to exactly where it was created, what inputs produced it, and how it propagated through your code.

This is "execution intelligence": not just reproducing what happened, but explaining why.

The provenance engine will be a commercial product. Record-replay is open source and will stay that way.

Get involved.

Show us some love.

github.com/retrace/retrace

Documentation.

github.com/retracesoftware/retrace/tree/main/docs

Issues/Feedback.

github.com/retracesoftware/retrace/issues

Discussion.

github.com/retracesoftware/retrace/discussions

We've been working on this for six years. We're excited to finally put it in your hands.

Retrace was invented and built by Nathan Matthews and the Retrace team. We're backed by Preston-Werner Ventures.

Questions? Contact Henry Yates, CEO: henry@retracesoftware.com

Introducing Retrace: Deterministic Record-Replay for Python

A real example: AI agent crashes

The crash:

Why you can't debug it:

With Retrace:

Why this took six years?

How it works?

Try it.

Install:

Record:

Replay:

Try a complete demo:

Supported libraries.

What's next?

Get involved.

Introducing Retrace:
Deterministic Record-Replay for Python