Replay the failed Python run, not just the stack trace.

Retrace records a failed Python test or CI run as a deterministic replay. Open it in VS Code, step backwards from the failure, and inspect the runtime state that actually happened.

Pip install retracesoftware

Open source · CPython 3.11+ · Pytest in one command · VS Code replay

A failed pytest run replayed in VS Code. Step backwards from the exception and inspect the value that caused it.

Failed runs become artifacts

Keep failed pytest and CI runs as .retrace files.

See benchmarks →

Replay in VS Code

Open the failed run locally and debug the execution that actually happened.

Step backwards

Move backwards from the exception to the runtime state that caused it.

Runtime facts for AI agents

Give your AI coding agent real stack frames and values — not just logs and tracebacks.

No test rewrite

Wrap your existing pytest command. No special test harness required.

Production path

Start in CI. Use the same recording model for production failures when ready.

QUICK START
Replay a failed pytest run.
Run your tests normally. If they fail, keep the failed execution as a replayable .retrace artifact.
Step 1 - install
Shell
python -m venv .venv
source .venv/bin/activate

python -m pip install retracesoftware

No app rewrite required.
Step 2 - record pytest
Run your app normally with one environment variable.
Record
mkdir -p recordings
RETRACE_RECORDING=recordings/failed-run.retrace \

  python -m pytest

If the test passes, discard the recording.

If it fails, keep it.
Step 3 - replay
Open the recording in VS Code and debug the original execution.
Replay
code .

# Open recordings/failed-run.retrace
# Start replay from the Retrace sidebar
# Step backwards from the failure

No live test process required. You are debugging the recorded execution.
Try the 10-Minute Demo. Want to see this end-to-end with a real example?
CI artifact
Keep failed CI runs as replayable artifacts.

Run pytest under Retrace in CI. If the job passes, ignore the recording. If it fails, upload the `.retrace` file as a build artifact.

Now the failed run does not disappear when the CI process exits. A developer can replay it locally in VS Code, step backwards from the failure, and inspect the runtime state that caused it.

An AI coding agent can use the same artifact as runtime context instead of guessing from logs and a traceback.

1. pytest in CI
2. failure
3. .retrace artifact
4. VS Code replay
Works as a plain CI artifact. No platform-specific plugin required.
GitHub Actions snippet
- name: Run pytest with Retrace
  run: |
    mkdir -p recordings
    RETRACE_RECORDING=recordings/failed-run.retrace python -m pytest

- name: Upload Retrace recording
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: retrace-failed-run
    path: recordings/failed-run.retrace
What makes this different
A stack trace tells you where Python crashed.
Retrace lets you replay the run that crashed.
Today With Retrace
CI artifacts CI artifacts are logs and tracebacks CI artifact is replayable
AI agents AI agents infer from partial context AI agents get runtime evidence
Failure Stack trace shows where it crashed Replay shows what happened before
What gets preserved Logs show what you predicted would matter Retrace preserves the failed execution
The failed execution becomes something you can inspect, replay, and share.
PROVENANCE ENGINE · EARLY ACCESS
Replay shows you what happened.
Provenance shows you why.

A recording lets you step through the execution. But when you're staring at a wrong value, the real question is: where did it come from?

Retrace's provenance engine traces any value back through the execution — from the point you noticed it, through every transformation, to the original input that caused it.

  • Select any value. Jump to its origin.

    Click a variable in the debugger and instantly see the exact line and inputs that produced it.

  • Chain backwards through transformations.

    Each origin has its own provenance. Keep drilling back until you reach the root cause.

  • Works on every value, not just outputs.

    Intermediate variables, function returns, container mutations — provenance covers everything in the execution.

Now in early access with select design partners.

 

PROVENANCE DRILLBACK

 


Three clicks from ZeroDivisionError to root cause: the API caller sent qty: "0" in the request body. No manual searching. No log correlation.

Q&A Section
Getting started
Is it hard to set up?
No. Retrace is a pip-installed agent. Set an env var, run your app, and you’re recording. No code changes.
Do I need to change my code?
No. Retrace attaches at the Python runtime level and works with your existing app. No logging, decorators, or special hooks.
What Python versions and frameworks work?
Preview supports Python 3.11, with Django/Flask and 60+ popular libraries tested.
Python 3.12 support is in progress, with broader coverage planned before GA.
Production Concerns
Can I run Retrace safely in production?
Yes. Retrace is built for production use.

It records at the Python runtime layer (not ptrace/libc), which keeps it safe for live workloads.
How much overhead does it add?
Measured latency overhead is ~1% or less on typical Django/Flask workloads. Benchmarks are available here.
How does Retrace handle sensitive data?
Retrace records execution and I/O — you control where traces live and who can access them. For stricter environments, traces can stay local/on-prem.
How It’s Different
Why can’t I just re-run the request?
Because many production failures depend on timing, concurrency, external services, or non-deterministic behavior.

A re-run often takes a different path.

Retrace lets you debug the exact execution that happened, after the fact.
How is this different from logging or APM tools?
Logs/APM show symptoms and depend on what you instrument. They can’t reconstruct past state.

Retrace records the real execution and lets you replay it deterministically, so you can inspect the actual code path and state.
Can it catch race conditions and flaky tests?
Yes. Retrace captures timing and thread interactions and replays them deterministically. This helps reproduce race conditions and flaky CI failures by replaying the run that failed.
Open Source & Community
Is it really open source?
Yes. The Record-Replay core is open source under Apache 2.0.
Why is this a preview release?
We’re opening the agent early to gather feedback while we expand Python/library coverage and harden for GA.
How can I contribute?
Try the agent, file issues, and submit PRs on GitHub. Library compatibility reports and docs fixes are great first contributions.
How does it work?

Retrace records external interactions (DB, API calls, file I/O, time) during a real run, then replays them deterministically in your local debugger — no prod access needed.

App runs normally
 
Your Production App running normally
External calls captured automatically
Bug happens Retrace captures it
Debug the exact execution locally
Debug Locally Replay in VSCode
Use cases

Perfect for:

  • Debug production-only bugs you can’t reproduce
    Replay the exact execution that already happened. No repro steps required.
  • Reproduce race conditions and timing-sensitive failures
    Capture and deterministically replay concurrency, async behavior, and thread interactions.
  • Stabilise flaky CI tests
    Replay the exact failing run to understand and fix non-deterministic test failures.
  • Debug systems with external dependencies
    Reproduce failures involving databases, APIs, file I/O, and other external services.
  • Investigate failures after the fact
    Inspect real code paths and state from incidents that are already over.
  • Let AI agents debug your production failures Retrace's DAP integration lets tools like Claude Code and Cursor step through recorded executions programmatically, including backwards.
Open Source
Community Discussion
Documentation
Report Bugs

Get launch updates

One email per month. Demos + release notes. Unsubscribe anytime.

Ready to debug the bugs you can’t reproduce?
Join the future of Python debugging (Preview Release - Python 3.11 & 3.12, <0.1% recording overhead)