Flaky tests are where good test suites go to die. A selector gets renamed, a page renders a beat slower, a button shifts down the layout, and a test that was green yesterday is red today, blocking a deploy that was never broken. Someone gets pinged, opens the run, realizes it is not a real bug, and spends twenty minutes nudging the test back to green. Multiply that across a suite and a team, and people quietly stop trusting the tests at all.

Most testing tools are very good at telling you a test is flaky. They are not good at doing anything about it. You still get the alert, and you still do the fixing.

Test-Lab now does the fixing. This is the feature we are proudest of, and we think it changes what an end-to-end test suite costs to own.

What flaky tests really cost

Flakiness does not feel expensive one failure at a time. It is expensive in aggregate, and the bill lands on your most senior people.

Consider a suite of 300 end-to-end tests that runs on every push, say 30 times a day. A 2 percent per-test flake rate, modest for a real browser suite, means almost every run goes red: on the order of six tests failing each time for no real reason, and a run is red whenever even one of them does. Day to day, that suite is red far more often than it is green, and almost none of it is a bug.

Each red run still pulls a person in: open it, read the trace, confirm it is noise, re-run, wait. Call it fifteen minutes. Across thirty runs a day, even if only some get a proper look, that is hours of senior engineering time, every day, spent proving that nothing was wrong. At a loaded engineering rate, the triage tax on a flaky suite runs to tens of thousands of dollars a year for a small team, and into six figures at scale, to keep a suite green that was never really red.

The harder cost is trust. Once a suite cries wolf often enough, people start ignoring red. The day a real regression slips through a wall of known-flaky failures, the suite has failed at the one job it had. Flaky tests do not just waste time. They quietly erode the reason you wrote the tests in the first place.

Auto-fix for flaky tests

When one of your saved Playwright tests goes flaky, or starts failing after a stretch of passing, Test-Lab can now repair it on its own. The AI compares a run where the test failed against a run where the same test passed, works out what actually drifted (a stale selector, a wait that was a touch too short, a step racing the page), rewrites the test to handle it, and re-runs the new version to confirm it is genuinely green before it touches anything.

If the fix holds, your saved test is updated and you get a notification that it was auto-fixed, with the exact diff of what changed. If it does not hold, nothing changes: your original test is left exactly as it was, and you are told it needs a human. You wake up to a suite that is greener than you left it, not to a queue of red tests to triage.

It is live now in beta, on the Scale plan.

How auto-fix repairs a flaky test

The mechanism is the part that matters, because it is what separates a real fix from a lucky retry.

Evidence, not guesswork. Auto-fix reads your actual run history: a run where the test failed and a recent run where the same test passed. The difference between those two runs is the signal. A selector that resolved before and does not now, a step that arrived a beat early, a value that landed late. The fix is reasoned from what really happened, not from a template.
A real change to the test. It rewrites the saved Playwright script to remove the cause: synchronizing on the right signal, waiting for the element that actually gates the next step, or updating a locator the app moved. It never simply loosens an assertion to force a pass.
Proof before it ships. The rewritten test is run several times, start to finish, and is only kept if it passes every time. A fix that cannot be made reliably green is thrown away and your original is restored. Nothing is applied on hope.

That loop (diff the evidence, change the test, prove the change) is what makes auto-fix trustworthy enough to let near a suite you care about.

Self-healing tests, done properly

Self-healing tests have been promised for years. In practice most of it is shallow: swap a broken CSS selector for a guess at a new one, and move on. That helps with exactly one failure mode and silently makes the rest worse.

Auto-fix is different on two fronts. It reasons about the whole test from real run evidence rather than patching a single selector blindly, and it proves the fix by running it before anything is saved. A repair is only ever kept when the test actually passes, several times in a row. That is the difference between a suite that heals and a suite that quietly rots.

It also goes beyond pure flakiness. When a test was passing and breaks after a refactor (a wrapper that makes a locator ambiguous, a slower render, a step that now races the page) auto-fix works out from the evidence what changed, adjusts the test to match, and verifies it. And when the only way to make a test pass would be to point an assertion at a different element, it does not guess: it leaves that one for you, with the evidence, rather than risk masking a real change. The everyday churn of keeping tests in step with a moving product is exactly the work it takes off your plate.

This is not "just retry it"

Retries are the usual answer to flakiness: run the test again, and again, until it happens to go green. That hides the problem without solving it. The test is still flaky. It still fails one run in ten. It still cries wolf, just less often, and it still breaks for real the day the flakiness gets worse.

Auto-fix is different because it changes the test. It addresses the reason the test was fragile, so the next hundred runs pass for the right reason. A retry buys you a quieter afternoon. A fix buys you a test you can trust again, and you pay for it once instead of paying the triage tax on every run.

It only fixes what should be fixed

The fastest way to lose trust in something like this is for it to "fix" a test that was right all along, a test that went red because it caught a real bug. Auto-fix is built to never do that.

A repair is only applied when two things are true: the failure looks like test fragility rather than a genuine change in your product, and the rewritten test passes a fresh set of runs from start to finish. A test that is correctly failing because your app changed is left red, on purpose, so you see the bug. A fix that cannot be made to pass reliably is discarded, and your original is kept. The bar to overwrite one of your tests is high by design.

You stay in control the whole time. Every auto-fix lands in your activity log and run reports with the full before-and-after diff, so nothing changes silently. It is opt-in per account, and you can switch it off whenever you want.

Cut test maintenance toward zero

Add the pieces up and the math changes. The false failure no longer reaches a person as a triage task; it reaches them as a fixed test and a one-line diff. Fifteen minutes of investigation becomes fifteen seconds of reading what changed. Because the fix is a real change rather than a retry, the same flake does not return next week, so you stop paying the tax on every run.

Maintenance, historically the single biggest reason teams give up on end-to-end testing, drops toward the floor. The engineers who used to babysit the suite get their afternoons back. The suite stays green for real reasons, so a red build means something again. That is the whole point: not a quieter suite, a suite you can trust, at a fraction of the upkeep.

Frequently asked questions

Is auto-fix just automatic retries? No. A retry re-runs the same test and hopes; auto-fix changes the test to remove the cause of the failure, then verifies the new version passes several times before saving it. The flake does not come back.

Will auto-fix hide a real bug? No. It only acts when the failure looks like test fragility, and it leaves a test that is correctly catching a real regression red, on purpose. Every change is verified before it is applied and recorded with a full diff, so nothing is masked and nothing is silent.

Which tests does it work on? Saved Playwright end-to-end tests: the ones running in CI, on a schedule, or kicked off from the CLI and MCP. AI runs already self-heal as they go, so auto-fix is about your saved scripts.

What does it cost? It is in beta on the Scale plan. Each fix attempt uses AI and is billed like a Refine; the verification re-runs are free. You can turn it off at any time.

Do I have to change my tests or my pipeline? No. Turn it on in Settings and it works on your existing runs. There is nothing to install and nothing to rewrite.

Turn it on

Open Settings > Preferences and enable "Auto-fix flaky tests." From then on, flaky and newly-failing script runs are repaired and verified for you, and you get told exactly what changed.

Flaky tests have been the tax you pay for end-to-end coverage since end-to-end testing existed. Auto-fix is how we start paying it down for you. It is in beta, we are steadily widening what it can repair, and if you are on the Scale plan it is already sitting in your Settings, waiting to be switched on.

Tests That Fix Themselves: Auto-Fix for Flaky Tests