When we first built Test-Lab's browser automation engine, we reached for Goose - Block's open-source AI agent framework. It was the obvious choice. Great developer experience, excellent documentation, and a proven architecture for orchestrating AI-powered workflows.

For months, Goose served us well. We still use it in other projects. This isn't a "Goose is bad" story - it's a "different tools for different jobs" story.

Why Goose was a great starting point

Credit where it's due: Goose is impressive.

Batteries included. Session management, extension loading, structured output, retry logic - all handled out of the box. You can go from "I want to build an AI agent" to "I have a working AI agent" in an afternoon.

The extension system is clever. Goose treats tools (like browser automation or file access) as pluggable extensions. This separation of concerns makes it easy to add capabilities without touching core logic.

It's production-ready. Block uses it internally. That shows. Error handling is thoughtful. Edge cases are covered. You're not beta-testing someone's weekend project.

If you're building a general-purpose AI agent - something that needs to handle diverse tasks, work across different tool ecosystems, or run interactively - Goose is probably the right choice.

Where it got complicated for us

Our use case is narrow: run browser-based QA tests, produce structured reports, capture screenshots. That's it. We don't need interactive sessions. We don't need multiple extensions. We don't need the flexibility that makes Goose powerful.

A few things started adding friction:

Token overhead. Goose's system prompt is comprehensive - identity, guidelines, extension documentation, response formatting. For a general-purpose agent, that context is valuable. For our single-purpose testing agent, it was extra tokens on every request. When you're running thousands of tests, that adds up.

Indirection layers. Our request flow was: our code → Goose → LLM → Goose → browser tools. Each layer is well-designed, but debugging issues meant tracing through multiple abstractions. When a test fails, you want to know why - fast.

Configuration surface area. YAML configs, profile files, recipe definitions. Goose's flexibility comes with configuration options we weren't using. More files to maintain, more things that could drift out of sync.

Dependency on external CLI. Goose is a Rust binary. Installing it in containers, keeping versions in sync across environments, managing the PATH - all solvable, but all extra moving parts.

None of these are flaws in Goose. They're tradeoffs that make sense for a general-purpose tool but created overhead for our specific use case.

What we built instead

We wrote a minimal agent loop that talks directly to the LLM and browser automation tools. The entire thing is around 800 lines of TypeScript.

One job, one prompt. No general-purpose system prompt. Our agent knows exactly what it's doing: navigate browsers, verify functionality, capture evidence, produce reports. The prompt is laser-focused on QA testing.

Direct tool execution. When the LLM wants to click a button or take a screenshot, we execute that immediately. No routing through an extension system. Fewer layers means faster debugging and clearer error messages.

Native Node.js. No external binaries. Same language as the rest of our stack. Same debugging tools. Same deployment pipeline.

Predictable token usage. We know exactly what context the LLM sees on every request. No surprises, no hidden prompt sections, no variation between runs.

The results

After the migration:

Faster cold starts. No CLI to initialize, no extension system to bootstrap.
Smaller container images. Dropped the Rust binary and its dependencies.
Simpler debugging. One codebase, one language, direct execution paths.
Lower token costs. Leaner prompts, no unused context.

We also gained something less tangible: complete understanding of every line of code that runs our tests. When something breaks at 3am, that matters.

When to stick with frameworks

This isn't advice to "roll your own everything." Frameworks like Goose exist because building reliable AI agents is hard. They encode lessons learned from real production usage.

Choose a framework when:

Your use case might evolve (you don't know all the tools you'll need yet)
You want a community maintaining the hard parts
Interactive sessions or multi-step reasoning are core features
You need the ecosystem (existing extensions, integrations)

Build custom when:

Your use case is narrow and well-defined
You're optimizing for a specific metric (latency, cost, reliability)
You need deep integration with existing systems
The framework's flexibility is becoming overhead

For most teams building AI features, reaching for a mature framework is the right call. We just happened to be in the second category.

Hats off to Goose

We shipped our first 10,000 test runs on Goose. It let us validate the idea before investing in custom infrastructure. That's exactly what good tools should do - get you started fast, then get out of the way when you need to optimize.

If you're building AI-powered features, give Goose a look. The Block team has built something genuinely useful. We're still fans.

We just needed something smaller.

Building AI-powered testing? Try Test-Lab - no scripts to write, no infrastructure to manage. Just describe what to test and we handle the rest.

Why We Built Our Own AI Agent (And What We Learned)