92% of US developers use AI coding tools every day. 41% of all code written globally is now AI-generated. Andrej Karpathy coined the term "vibe coding" in early 2025, and by the end of the year it was Collins Dictionary's Word of the Year.
The pitch is simple: describe what you want in plain English, let the AI generate it, and iterate based on what you see. Don't read every line. Trust the vibes.
It works. Sort of. The code appears fast, the demos look great, and the dopamine hit of shipping a feature in 20 minutes is real. But there's a growing crack in the foundation that nobody wants to talk about: most vibe-coded software ships without real testing.
And that's where things start breaking.
The vibe coding moment
The numbers are hard to ignore.
25% of Y Combinator's W25 batch is running on codebases that are 95% AI-generated. 63% of people using vibe coding tools aren't even developers - they're designers, PMs, founders who can now build functional software by talking to a model.
Tools like Cursor, Claude Code, and Codex have made it possible to go from idea to working prototype in hours instead of weeks. The barrier to writing code has essentially disappeared.
But the barrier to writing correct code hasn't moved at all.
Where the vibes stop
There's a pattern we see over and over. Someone vibe codes a landing page. It looks great. They add a signup flow. Works fine. They build a dashboard. Looking good.
Then they hit authentication edge cases. Payment webhooks. Race conditions in async state. Form validation across 15 fields. The "difficulty cliff" shows up fast - and the AI that breezed through the landing page starts producing code that fails in subtle, hard-to-catch ways.
The stats back this up:
- 63% of developers have spent more time debugging AI-generated code than they would have writing it themselves
- 72% of organizations have experienced P1 incidents caused by AI-generated code
- 45% of AI-generated code contains security vulnerabilities
- AI-generated code is 2.7x more likely to introduce XSS vulnerabilities than human-written code
That last one is worth sitting with. Almost half of AI-generated code has security flaws. And the people using vibe coding tools are often the ones least equipped to spot them - because the whole point is that you don't need to read every line.
Why vibe coding makes testing harder
Traditional testing assumes you wrote the code and understand what it does. You write tests based on your mental model of the system. You know which edge cases matter because you thought through them while building.
Vibe coding breaks that assumption.
You can't test what you don't understand. 40% of junior developers deploy AI-generated code without fully understanding what it does. They can't write meaningful tests for it because they don't know what the failure modes are. The code works in the demo. That's enough, right?
AI generates code fast but doesn't generate test plans. You can ask Cursor to build you a checkout flow in 10 minutes. But it won't proactively say "here are the 15 things that could go wrong and how to verify each one." The code arrives without a map of its own risks.
The prototype becomes the product. Vibe-coded prototypes have a habit of becoming production systems. The plan was always to "clean it up later." Later never comes. And now you have a codebase nobody fully understands, with no tests, handling real user data.
The irony is thick. The same AI that writes buggy code is the one people turn to for testing. "Write tests for this code" produces tests that pass - because the AI generates tests that match its own assumptions, not the actual requirements. The tests validate the implementation, not the behavior.
Testing doesn't have to mean writing test code
Here's the thing that most vibe coders miss: testing doesn't have to look like expect(result).toBe(42).
If the whole point of vibe coding is that you describe what you want in plain English and the AI figures out the implementation - why can't testing work the same way?
Instead of writing test scripts, describe what should work:
- "A user can sign up with an email and password, then see their dashboard"
- "Adding an item to the cart updates the total price"
- "The payment form rejects expired credit cards"
- "Logged-out users get redirected to the login page when accessing /settings"
That's it. No selectors. No assertions. No page objects. Just plain descriptions of what your app should do.
An AI testing agent can take those descriptions, open a real browser, navigate your app like a user would, and verify that the behavior matches. If the button doesn't work, the agent finds out. If the redirect is broken, the agent catches it. If the form accepts garbage input, the agent reports it.
This is what "vibe testing" actually means - testing at the same level of abstraction as vibe coding. You described what to build in English. Now describe what should work in English. Let AI handle both sides.
The workflow that closes the loop
Right now, most vibe coding workflows look like this:
- Describe what you want
- AI generates code
- Look at it, seems fine
- Ship it
- Find out it's broken when users complain
Step 3 is where the process fails. "Seems fine" is not testing. Looking at a page and clicking around for 30 seconds is not QA.
Here's what the workflow should look like:
- Describe what you want
- AI generates code
- Describe what should work (plain English test plan)
- AI testing agent verifies the behavior in a real browser
- Fix what's broken, iterate
- Ship it with confidence
Step 3 and 4 are the missing pieces. They take minutes, not hours. And they catch the kind of bugs that "seems fine" misses - the broken edge case, the form that accepts empty fields, the button that doesn't work on mobile, the redirect that loops forever.
The point isn't to slow down vibe coding. It's to make it actually safe to ship.
The cost of skipping this step
Let's be blunt about what happens when vibe-coded apps ship without testing.
The small stuff: broken links, forms that don't validate, buttons that do nothing on certain browsers. Annoying. Fixable. But embarrassing when users hit them.
The medium stuff: checkout flows that silently fail, auth that can be bypassed, data that gets corrupted by race conditions. This is where the 72% P1 incident stat comes from.
The big stuff: security vulnerabilities in production. With 45% of AI-generated code containing security flaws, and vibe coders not reading every line, some of those flaws are making it to production. XSS, SQL injection, broken access controls - the classics, generated fresh by AI.
The speed of vibe coding makes this worse, not better. When you can ship a feature every hour, you can also ship a vulnerability every hour. The feedback loop is so fast that bugs accumulate before anyone notices.
Testing is the brake pedal. Not to stop you from going fast - to stop you from going fast into a wall.
What we built for this
At Test-Lab, this is exactly the problem we set out to solve.
You write test plans in plain English. Our AI agent opens a real browser, navigates your app, and executes those tests like a human QA tester would. No scripts. No selectors. No test infrastructure to maintain.
When something fails, you get screenshots, step-by-step logs, and a clear explanation of what went wrong. When everything passes, you get confidence that your app actually works - not just that it "seems fine."
This is a natural fit for vibe-coded apps. If you didn't want to write code to build it, you probably don't want to write code to test it either. Describe what should work, and let AI verify it.
The same energy that makes vibe coding productive - plain language in, working results out - should apply to testing too.
Ship vibe-coded apps with confidence. Try Test-Lab - describe your tests in plain English, and our AI agent verifies them in a real browser.
