Debug production failures locally, even when you can’t reproduce them.

Production-safe record-replay for Python. Capture real executions and replay them deterministically

pip install retracesoftware.proxy

PREVIEW RELEASE: Apache-2.0 OSS · ≤2% overhead · Deterministic replay · Python 3.11/3.12 · Backed by PWV

Production-safe recording

Minimal overhead.
Safe in production

Deterministic Replay

Reproduce the exact execution, every time

VS Code debugging

Debug a past production execution

Captures external interactions

Network/DB/time/file I/O
recorded for replay.

No code changes

Works without rewriting your app.

Benchmarked overhead

See benchmarks →

Quick Start
Step 1 - install
Install
# Install Retrace into the same Python environment as your app.
python -m pip install
--upgrade pip
python -m pip install
retracesoftware.proxy retracesoftware.autobundle

# Installs the Retrace runtime
# Automatically configures support for Python 3.11 / 3.12
# No code changes required
Step 2 - record
This is safe to do in production with minimal overhead. You only need to record once — the real failing execution.
Record
# Run your app under Retrace when the bug occurs.
RRETRACE_RECORDING_PATH=./
recordings \

python your_app.py


# Your app runs normally
# External interactions (DB, network, time, file I/O) are captured
# Timing and concurrency are preserved
# The recording is written to disk
Step 3 - replay
This is not a re-run. It’s a faithful replay of the original production failure.
Replay
# Replay the recorded execution in a fully deterministic environment.
retrace run ./recordings
# Or open it directly in VS Code:
code ./recordings/retrace.replay

# Set breakpoints
# Step through code
# Inspect variables and control flow
# Debug the exact execution that already happened
Try the 10-Minute Demo. Want to see this end-to-end with a real example?
Q&A Section
Getting started
Is it hard to set up?
No. Retrace is a pip-installed agent. Set an env var, run your app, and you’re recording. No code changes.
Do I need to change my code?
No. Retrace attaches at the Python runtime level and works with your existing app. No logging, decorators, or special hooks.
What Python versions and frameworks work?
Preview supports Python 3.11, with Django/Flask and 60+ popular libraries tested.
Python 3.12 support is in progress, with broader coverage planned before GA.
Production Concerns
Can I run Retrace safely in production?
Yes. Retrace is built for production use.

It records at the Python runtime layer (not ptrace/libc), which keeps it safe for live workloads.
How much overhead does it add?
Measured latency overhead is ~1% or less on typical Django/Flask workloads. Benchmarks are available here.
How does Retrace handle sensitive data?
Retrace records execution and I/O — you control where traces live and who can access them. For stricter environments, traces can stay local/on-prem.
How It’s Different
Why can’t I just re-run the request?
Because many production failures depend on timing, concurrency, external services, or non-deterministic behavior.

A re-run often takes a different path.

Retrace lets you debug the exact execution that happened, after the fact.
How is this different from logging or APM tools?
Logs/APM show symptoms and depend on what you instrument. They can’t reconstruct past state.

Retrace records the real execution and lets you replay it deterministically, so you can inspect the actual code path and state.
Can it catch race conditions and flaky tests?
Yes. Retrace captures timing and thread interactions and replays them deterministically. This helps reproduce race conditions and flaky CI failures by replaying the run that failed.
Open Source & Community
Is it really open source?
Yes. The Record-Replay core is open source under Apache 2.0.
Why is this a preview release?
We’re opening the agent early to gather feedback while we expand Python/library coverage and harden for GA.
How can I contribute?
Try the agent, file issues, and submit PRs on GitHub. Library compatibility reports and docs fixes are great first contributions.
How does it work?

Retrace records external interactions (DB, API calls, file I/O, time) during a real run, then replays them deterministically in your local debugger — no prod access needed.

App runs normally
 
Your Production App running normally
External calls captured automatically
Bug happens Retrace captures it
Debug the exact execution locally
Debug Locally Replay in VSCode
Use cases

Perfect for:

  • Debug production-only bugs you can’t reproduce
    Replay the exact execution that already happened. No repro steps required.
  • Reproduce race conditions and timing-sensitive failures
    Capture and deterministically replay concurrency, async behavior, and thread interactions.
  • Stabilise flaky CI tests
    Replay the exact failing run to understand and fix non-deterministic test failures.
  • Debug systems with external dependencies
    Reproduce failures involving databases, APIs, file I/O, and other external services.
  • Investigate failures after the fact
    Inspect real code paths and state from incidents that are already over.
  • Debug non-deterministic systems (including AI calls and agents)
    Understand failures where behavior depends on timing, model responses, or external tools.
Open Source
Community Discussion
Documentation
Report Bugs

Get launch updates

One email per month. Demos + release notes. Unsubscribe anytime.

Ready to debug the bugs you can’t reproduce?
Join the future of Python debugging (Preview Release - Python 3.11 & 3.12)