⊡ PROVENANCE ENGINE

Trace any value
back to its origin.

Replay shows you what happened. Provenance answers the harder question: why. Select any value in a recorded execution and trace it backwards — through every transformation, function call, and assignment — to the input that caused it.

REAL-WORLD EXAMPLE

A Flask API returns the wrong price.
Provenance finds the root cause.

A customer reports they were charged £0.00 for an order. The execution was recorded. Here's how provenance traces the bug in three steps — no log searching, no guessing.

A real example: an AI agent crashes in production

Your AI application classifies customer intents using an LLM. It works perfectly in testing, but crashes occasionally in production.

Even when you request structured JSON output, rare edge cases still happen in production: streaming truncation, retries, provider hiccups, tool errors. When they do, reproducing the exact failure is the hard part.

Python

Copy code

				# Your code

				def classify_intent(message):

    # We ask for JSON, but rare non-JSON responses still occur in production.

    response = llm.chat(prompt=f"Classify: {message}")

    data = json.loads(response)  # <-- JSONDecodeError

    return data["intent"]

The crash:

None

Copy code

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Why you can't debug it:

The failure is non-deterministic: rerunning locally won’t reproduce the exact response.
Logs rarely include the full prompt/response (cost/PII), and miss surrounding execution state.
APM shows the stack trace and timing, but it can’t recreate the exact external calls and state.

With Retrace:

Record the execution:

Shell

Copy code

RETRACE=1 RETRACE_RECORDING_PATH=crash python app.py

1. Replay in VS Code:

Shell

Copy code

				code crash/replay.code-workspace
				# Set breakpoint, press F5
			

2. Inspect the exact LLM response: that caused the crash (and any retries):

None

Copy code

response = "Sure! The intent is billing with high confidence."

Root cause found: the LLM returned conversational text instead of JSON. In replay, you can inspect the exact prompt and model response that triggered the crash (and any retries), then turn this recorded failure into a regression case.

Try the full AI observability demo →

LLMs make nondeterminism obvious, but the same “can’t reproduce” problem happens with flaky APIs, race conditions, CI-only failures, and prod-only data.

Why this took six years?

Deterministic replay sounds straightforward: capture external inputs, replay them in the same order.

In practice, Python has pervasive non-determinism that even experienced Python developers don't expect:

Threading. Even with the GIL, thread interleaving is nondeterministic. Two runs of the same code can execute bytecode in different orders. We built a C++ demultiplexer that tracks and reproduces thread scheduling at the Python bytecode level without requiring lock instrumentation.

Dictionary iteration. Before Python 3.7, dict iteration order was explicitly undefined. After 3.7, insertion order is preserved—but only for operations Python controls. C extensions and hash collisions can still cause surprises.

Library internals. Many libraries have hidden nondeterminism: connection pooling, retry jitter, lazy initialization, caching keyed by object id. These behaviors are invisible during normal execution but cause replay to diverge.

The observer effect. Debugging itself changes execution. Attaching a debugger, setting breakpoints, or importing certain modules can alter timing and behavior.

Early versions of Retrace tried to solve this with shallow hooks that intercepted at the function-call level. That worked for demos but broke on real code. We spent years going deeper: a custom proxy system that captures at the internal/external boundary, a C++ demultiplexer for thread ordering, and ultimately a complete Python bytecode interpreter for provenance analysis.

The result: Retrace can record and replay real Python applications: Flask, Django, Requests, with <1% overhead in production (see benchmarks).

How it works?

Retrace divides your code into two worlds:

Internal code: Your application logic (deterministic given the same inputs)

External code: Network calls, database queries, filesystem access, time, randomness

During recording, Retrace proxies the boundary between these worlds. Every call to external code is intercepted, and both the arguments and results are serialized to a trace file. Your code runs normally; the trace is a side effect.

During replay, the same proxies are active, but instead of making real external calls, they return the recorded results. Your internal code executes identically because it receives identical inputs.

None

Copy code

				Recording:

				Your code → [proxy intercepts] → External library → [result recorded] → Your code

				Replay:

				Your code → [proxy intercepts] → Recorded result → Your code

Threading is handled by a C++ demultiplexer that tracks the original thread interleaving and blocks threads during replay until it's their turn to execute. This reproduces the exact execution order without requiring lock instrumentation.

The result: production executions become portable artifacts you can replay anywhere—no network, no database, no credentials needed.

Architecture.

For readers interested in the technical details, Retrace consists of three main components:

1. Proxy System (Python)

Intercepts calls at the internal/external boundary. We dynamically generate proxy types that wrap external objects and route calls through recording/replay logic. This operates at Python call boundaries, not syscalls.

2. Demultiplexer (C++)

Handles thread ordering. Tracks which thread executed which bytecode instruction during recording. During replay, blocks threads until it's their turn, reproducing the exact interleaving without lock instrumentation.

3. Bytecode Interpreter (C++)

For provenance (coming Q2 2026), we replace CPython's eval loop with a custom interpreter that tracks value lineage. This runs during replay only, recording uses standard CPython.

The key architectural decision: capture Python semantics, not just syscalls. We see function arguments, return values, and object state, not just read/write calls. This trades syscall-level generality for Python-level clarity.

Try it.

Prerequisites: macOS or Linux, Python 3.11, VSCode

Install:

Shell

Copy code

						python -m pip install --upgrade pip
						python -m pip install --upgrade retracesoftware.proxy requests
						python -m retracesoftware.autoenable
					

Record:

Shell

Copy code

						# Run your application with recording enabled
						RETRACE=1 RETRACE_RECORDING_PATH=recording python your_app.py
					

This creates a recording/ directory containing everything needed to replay the execution.

Replay:

Shell

Copy code

				# CLI replay

				cd recording/run

				python -m retracesoftware --recording ..

				# Or debug in VS Code

				code recording/replay.code-workspace

				# Set breakpoints, press F5

Step through the code. You'll see the same values, the same responses, the same execution—no network calls are being made. The trace contains everything.

Try a complete demo:

Quickstart: Flask crash replay (10 minutes)
AI observability: Debug LLM failures (15 minutes)

Supported libraries.

The open-source preview supports:

HTTP: Requests
Web frameworks: Flask, Django
Database: psycopg2 (PostgreSQL)
Core: threading, time, random, os.environ

If your stack isn't covered, open an issue — We're expanding based on user needs.

Why most libraries "just work": Retrace operates at Python call boundaries, not inside library internals. C extensions don't need instrumentation, as long as they're called via Python functions and return Python types, Retrace can record and replay them. See Supported Environments for details.

What's next?

Record-replay is the foundation. The bigger opportunity: answering why.

Where did this value come from?
How did it get into this state?
What was the chain of transformations?

We're building a provenance engine that traces any value back through its entire lineage. Select a variable in the debugger → jump to where it was created → see what inputs produced it → follow how it propagated through your code.

This is "execution intelligence": not just reproducing what happened, but explaining why.

Commercial model:

Record-replay: Open source (Apache 2.0), always free
Provenance engine: Commercial product, shipping Q2 2026

We're also building an MCP server so AI tools can query execution traces directly for debugging assistance.

Get involved.

GitHub – Star the repo, contribute, or just explore

Docs – Complete guides and API reference.

Issues – Report bugs or request features.

Discussions – Get help, share ideas, or discuss use cases.

We've been working on this for six years. We're excited to finally put it in your hands.

Retrace was invented and built by Nathan Matthews and the Retrace team. Backed by Preston-Werner Ventures.

Questions? Contact Henry Yates, CEO: henry@retracesoftware.com

Trace any valueback to its origin.

A Flask API returns the wrong price. Provenance finds the root cause.

A real example: an AI agent crashes in production

The crash:

Why you can't debug it:

With Retrace:

Why this took six years?

How it works?

Architecture.

Try it.

Install:

Record:

Replay:

Try a complete demo:

Supported libraries.

What's next?

Get involved.

Trace any value
back to its origin.

A Flask API returns the wrong price.
Provenance finds the root cause.