Script Generation, Healing, and AI Refinement
Turn a passing AI run into a real Playwright spec, run it deterministically forever after, and let AI handle UI drift and intentional changes.
Script Generation
Test-Lab can turn any passing AI run into a real Playwright test that you check into your repo and run from your own CI. After generation, re-runs skip the AI entirely. The AI comes back only at two moments: when the script breaks and the healer offers a patch, or when you want to change behavior and use AI refinement to update the script through chat.
This page covers how to use each of those three pieces. For the why, see the blog post on self-healing Playwright tests.
Plans: Script generation is included on the Scale plan with no caps. Pay-as-you-go accounts get a free demo of up to 3 generated scripts and 5 runs per script, with an upgrade prompt at each cap. Free accounts do not have access.
Generate a script
The starting point is a passing AI run. Open the report for any test plan run that completed successfully.
- Scroll to the Generate Playwright Scripts card on the right side of the report.
- Click Generate Script (the button label adjusts when the test plan covers more than one device).
- Live progress moves through extraction, assertion authoring, assembly, and validation phases. Generation typically costs around 2× the original AI run.
- When the card flips to Script Generated, click Go to Test Plans to find the run-script affordances on the plan's row.
The output is a regular Playwright spec stored against the test plan, keyed on the device. Multi-device test plans get one spec per device. Pipelines get one spec per step.
What the generated script contains
- Concrete selectors derived from the AI run's interaction trace
- Assertions tied back to the test plan's acceptance criteria
- Cookies and headers from the test plan's project / plan settings
- The same agent type behavior the AI run used (functional, performance, etc.)
The spec is plain Playwright. You can fork it, paste it into your own repo, run it from your own CI, or use it inside Test-Lab as a script run.
Run a generated script
Once a script exists for a test plan, the test plan's row in Test Plans shows a Run dropdown with a Script Run option.
- Script Run: executes the saved Playwright spec against your target. No LLM in the hot path. Runs in the same Docker container as AI runs, with the same screenshot and timing capture.
- Script Pipeline (when all steps in a pipeline have scripts generated): executes the full pipeline as a single deterministic Playwright run, sharing browser state across steps the same way the AI pipeline did.
Script runs cost a fraction of an AI run. Most of the cost is the CI minutes you already pay for. The report layout matches an AI run's report so you do not have to relearn how to read it.
Running scripts from your own CI
Each generated script is a regular Playwright spec. You can:
- Pull it out of Test-Lab and check it into your repo as part of your normal Playwright suite.
- Run it from your own Playwright config with no Test-Lab dependency.
- Keep it inside Test-Lab and use the API to trigger script runs from your CI on a schedule or on every merge.
Pick the option that matches how the rest of your test suite is organized.
When a script breaks: the healer
Generated scripts go red the same way hand-written ones do. Selectors drift, modals get replaced with side panels, button copy changes. When that happens on a script run, the healer offers a patched version of the script.
How it works from the outside:
- The script run fails.
- The run report shows the failure plus a proposed patch as a diff.
- You review the diff and either:
- Accept the patch. The script is updated; the next run uses the new version.
- Reject the patch. The script stays as it was; you go fix the test by hand or accept that the failure is a real bug.
The healer is a suggestion engine, not an autopilot. Nothing changes in your saved script unless you click accept. A real bug in the application under test is meant to surface as a failed run, not get patched away.
Read every patch before accepting. Healed patches are model-generated. They are usually right when the failure is a UI drift, and usually wrong when the failure is a genuine regression in the product. The diff view exists so you can tell the two apart.
AI refinement: change behavior through chat
Healing handles unintentional breakages. Refinement handles intentional change.
If you want a generated script to assert on a different field, follow a different path, or test a new variant of an existing flow, open the script's Refine view and describe the change in plain English. The AI proposes a patch, you review the diff, you save or discard.
Examples of useful refinements:
- "Also assert that the welcome banner contains the user's first name."
- "Add a step before logout to capture the dashboard screenshot."
- "Change the booking date from tomorrow to next Monday."
Each refinement is a single conversation turn that produces a single diff. You can chain refinements: each turn picks up where the last one left off, so you can iterate without restarting.
Refinement uses AI credits. The cost per turn is small; healing and refinement together rarely move the needle on a stable suite, but they are bounded by you, not by the suite size.
What this changes about flaky tests
The maintenance cost of an E2E suite usually scales with how often the underlying app changes. A suite with 200 hand-written tests and a UI that ships weekly is a part-time job for at least one engineer.
Generated scripts plus healing change the math:
- The first time a test breaks, you do not write off the failure as flaky. You look at the proposed patch.
- If the patch is right, you accept and move on. If it is wrong, you treat the failure as a real bug.
- The default response to a red CI run becomes "review the diff," not "rerun and hope."
See the flaky tests field guide for upstream causes and mitigations that complement this workflow.
Limitations and gotchas
- Generation requires a passing AI run. A run that failed cannot be turned into a script directly; fix the underlying test plan, run it, then generate.
- Pipeline scripts must be generated per step. Multi-step pipelines need a passing AI run for each step before a full Script Pipeline option appears.
- Stale scripts (the test plan was edited after the script was generated) are flagged in the run dropdown as
(outdated). Run them as-is to see whether the new test plan still works against the old script, or regenerate. - Browser must be available. Script runs go through the same Playwright container as AI runs, so devices configured at the test plan level still apply.
- Cookies and credentials are picked up from the project / test plan settings on every script run. Changes you make in the UI take effect on the next run.
Plans and quotas
| Plan | Generation | Runs per script | Healing | Refinement |
|---|---|---|---|---|
| Free | Not available | Not available | Not available | Not available |
| Pay-as-you-go (demo) | 3 scripts total | 5 runs each | Included | Included |
| Scale | Unlimited | Unlimited | Included | Included |
| Enterprise | Unlimited | Unlimited | Included | Included |
PAYG demo limits exist so the feature is accessible without a plan upgrade, while still pointing power users toward Scale. The cap counts distinct test plans (multi-device test plans count as one), and refinement / healing turns do not count toward the run cap.