Give your AI debugger the failed run, not another traceback.

Hook Retrace into CI to capture failed pytest runs as replayable artefacts. The AI debugger inspects runtime values directly, reducing tokens, fewer runs, faster turnaround.

Pip install retracesoftware

Free in preview · Open-source engine · Python 3.11 & 3.12 · Runs locally or in CI · Verify in VS Code · Backed by PWV

AI works backwards through a failed pytest run using the Retrace debugger, finds the root cause, and shows the values that actually occurred. Open the same recording in VS Code to verify it.

AI debugger for CI failures. Diagnose failed pytest runs from runtime evidence, not guesses.

 

AI debugger for CI failures

Diagnose failed pytest runs from runtime evidence, not guesses.

Failed runs become artifacts

Keep failed pytest and CI runs as .retrace files.

Runtime values, not just logs

The debugger works from stack frames, variables, and the values from the failed run.

Verify in VS Code

Open the same recording and step through the failed execution yourself.

No test rewrite

Wrap your existing pytest command. No special harness required.

The same recording model works for production failures when you are ready

Recording overhead is under 0.1% on typical web workloads.

See the performance benchmarks →

CI artifact
Keep failed CI runs as replayable artifacts.

Run pytest under Retrace in CI. If the job passes, discard the recording. If it fails, keep the .retrace artifact.

The Retrace AI debugger inspects the failed run and returns a root-cause diagnosis backed by runtime values. Open the same artifact in VS Code to verify the result yourself.

1. pytest in CI
2. failure
3. .retrace artifact
4. AI diagnosis
5. VS Code replay
Works as a plain CI artifact. No platform-specific plugin required.
GitHub Actions snippet
- name: Run pytest with Retrace
  run: |
    mkdir -p recordings
    RETRACE_RECORDING=recordings/failed-run.retrace python -m pytest

- name: Upload Retrace recording
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: retrace-failed-run
    path: recordings/failed-run.retrace
QUICK START
Run Retrace on your next failed pytest run.
Run your tests normally. If they fail, keep the failed execution as a replayable .retrace artifact.
Step 1 - install
pip install retracesoftware. No app rewrite required.
Shell
python -m venv .venv
source .venv/bin/activate

python -m pip install retracesoftware

No app rewrite required.
Step 2 - Run pytest with Retrace
One environment variable. If the test passes, discard the recording. If it fails, keep it.
Record
mkdir -p recordings
RETRACE_RECORDING=recordings/failed-run.retrace \

  python -m pytest

If the test passes, discard the recording.

If it fails, keep it.
Step 3 - Diagnose or verify
Get a root-cause diagnosis from the AI debugger, or open the recording in VS Code to inspect the failed execution yourself.
Replay
code .

# Open recordings/failed-run.retrace
# Start replay from the Retrace sidebar
# Step backwards from the failure

No live test process required. You are debugging the recorded execution.
Try the 10-Minute Demo. Want to see this end-to-end with a real example?
What makes this different
AI agents should not debug from tracebacks alone.
Today With Retrace
CI artifacts CI artifacts are logs and tracebacks Failed CI runs become replayable artifacts
AI agents AI agents infer from partial context The AI debugger gets runtime evidence
Failure Stack traces show where Python stopped Retrace shows the values that led there
What gets preserved Logs show what you predicted would matter Retrace preserves the failed execution
The failed execution becomes something the AI can diagnose and you can verify.
How it works.
1
RUN
Run pytest, CI, or your Python app with Retrace enabled.
2
RECORD
Retrace records the execution into a `.retrace` artifact.

run-2025-05-05.retrace

~ Single source of truth

3
DIAGNOSE
The AI debugger opens the recording and finds the root cause from the runtime values.
 
4
VERIFY
Open the same recording in VS Code and step through the failed execution.

Once the cause is clear, your agent can propose a fix.

Deterministic
& reproducible
Works locally
offline
Shareable
artifact
Perfect context
for AI
Why the AI debugger can trust the evidence.

Retrace does not ask the AI to infer runtime state. It records the failed execution and replays it deterministically, so the debugger inspects the values that actually occurred, not a reconstruction. That is why the diagnosis points to real values at real steps, and why you can verify every one of them.

App runs normally
Your Production App running normally
External calls captured automatically
Bug happens Retrace captures it
Debug the exact execution locally
Debug Locally Replay in VSCode
PROVENANCE ENGINE · EARLY ACCESS
From runtime evidence to root cause.
Replay shows the failed execution. Provenance lets the debugger ask where a value came from.

Retrace's provenance engine traces any value back through the execution — from the point you noticed it, through every transformation, to the original input that caused it.

  • Select any value. Jump to its origin.

    Click a variable in the debugger and instantly see the exact line and inputs that produced it.

  • Chain backwards through transformations.

    Each origin has its own provenance. Keep drilling back until you reach the root cause.

  • Works on every value, not just outputs.

    Intermediate variables, function returns, container mutations — provenance covers everything in the execution.

Now in early access with select design partners.

 

PROVENANCE DRILLBACK

 


Three clicks from ZeroDivisionError to root cause: the API caller sent qty: "0" in the request body. No manual searching. No log correlation.

Production
Start in CI. Use the same model for production.

Retrace starts with Python CI failures because the value is immediate and the risk is low. The same recording model works for production failures when you are ready: record the execution once, replay it safely, and inspect what actually happened. Recording overhead is under 0.1% on typical web workloads.

How it works

1. Python code

2. Boundary calls

DB
API
Files
Time
Randomness

3. .retrace recording

CALL
RESULT
ERROR

4. Local replay

Same code, external calls stubbed
Thread ordering preserved
Use cases

Perfect for:

  • Diagnose failed CI runs with AI.
    Debug production-only bugs you can’t reproduce
    Replay the exact execution that already happened. No repro steps required.
  • Help coding agents debug broken tests.
    Give coding agents runtime evidence from the failed run, not just a traceback.
  • Stabilise flaky tests.
    Replay the exact failure to understand non-deterministic behaviour.
  • Reproduce external dependency failures.
    Replay failures involving APIs, databases, files, time, or other external calls.
  • Investigate after the fact.
    Inspect real code paths and runtime state after the process has exited.
  • Debug production-only failures.
    Use the same recording model for production crashes you cannot reproduce locally.
Q&A Section
Getting started
Can I use this with pytest?
Yes. The first workflow is wrapping an existing pytest command and keeping the `.retrace` file when the run fails.
Do I need to change my tests?
No. The goal is to wrap the command you already run.
What Python versions work?
Python 3.11 & 3.12
CI, AI, production
Is it free?
Yes. Retrace is free to use during preview on real pytest and CI failures, no card required. We will introduce paid plans later for teams that need more. Pricing will live on a pricing page when it is ready.
How does this help AI coding agents?
Agents normally see source code, logs, and tracebacks. Retrace gives them the runtime values from the run that failed. The Retrace AI debugger uses those values to return an evidence-backed root-cause diagnosis, and MCP-compatible agents can inspect recordings directly where supported.
Can I use this in CI?
Yes. Run pytest under Retrace and keep the .retrace file as a CI artifact when the job fails. The AI debugger analyses the artifact, and you can open the same recording in VS Code.
Can I verify what the AI debugger says?
Yes. Every diagnosis is backed by a .retrace recording. Open it in VS Code and inspect the failed execution yourself.
Do I pay for failed recordings?
No. Capturing a failed run is free. When we introduce paid plans, they will be based on completed diagnoses, not on failures.
How It’s Different
Why can’t I just re-run the request?
Because many production failures depend on timing, concurrency, external services, or non-deterministic behavior.

A re-run often takes a different path.

Retrace lets you debug the exact execution that happened, after the fact.
How is this different from logging or APM tools?
Logs/APM show symptoms and depend on what you instrument. They can’t reconstruct past state.

Retrace records the real execution and lets you replay it deterministically, so you can inspect the actual code path and state.
Can it catch race conditions and flaky tests?
Yes. Retrace captures timing and thread interactions and replays them deterministically. This helps reproduce race conditions and flaky CI failures by replaying the run that failed.
Open Source & Community
Is it really open source?
Retrace is open source and built for Python developers.
Why is this a preview release?
We’re opening the agent early to gather feedback while we expand Python/library coverage and harden for GA.
How can I contribute?
Try the agent, file issues, and submit PRs on GitHub. Library compatibility reports and docs fixes are great first contributions.

 

Built by Retrace Software.

Backed by Preston-Werner Ventures.

 

Get product updates

One email per month. New capabilities, demos, release notes. Unsubscribe anytime.

Ready to debug your next Python CI failure with runtime evidence?
Start free with pytest. Keep failed runs as .retrace artifacts and let the AI debugger find the cause. Verify it in VS Code.

Open source · Python 3.11 & 3.12 · Pytest in one command · VS Code replay