Vercel research: Static context beats dynamic skills for AI agent accuracy

Vercel's AI SDK team found that embedding documentation in AGENTS.md files achieved 100% pass rates versus 53% for tool-based skills in coding agents. The finding challenges the common enterprise pattern of building complex skill systems to prevent hallucinations.

TheBiggish Editorial · Monday, February 2, 2026

The Pattern

Enterprise teams building custom AI agents face a recurring problem: hallucinations persist despite adding more tools, skills, and retrieval systems. An agent insists on deprecated Next.js syntax. Another misreads a £716 visa fee as £70,000. The instinct is to add more guardrails.

Vercel's AI SDK team tested a different approach. Instead of building skills that agents invoke on-demand, they embedded knowledge directly into static context files. The results were stark.

The Data

In evaluations using Next.js 16 documentation:

Dynamic skills (tool-based lookups): 53% pass rate
Static context (AGENTS.md files): 100% pass rate

The failure mode was predictable: agents forgot to invoke tools or invoked them incorrectly. Static context eliminates the decision entirely - the information is always present.

What This Means

The finding aligns with December 2025's VS Code Agent Skills integration and October 2025 discussions comparing Skills.md versus AGENTS.md approaches. The trade-off is clear:

AGENTS.md strengths: No invocation failures. Consistent across conversation turns. Better for codebase-wide rules and project structure.

Skills strengths: Reusable across projects. Better for large, specialized knowledge sets where context window costs matter. Don't inline 50 capabilities.

Vercel's approach - calling it "Hands vs. Brains" - puts knowledge in markdown (AGENTS.md, docs/) and reserves skills for execution-only tasks (API calls, terminal commands).

The Enterprise Angle

For teams building internal agents, this suggests auditing what's implemented as skills. If it's knowledge the agent should always have - coding standards, architecture decisions, approved vendors - it probably belongs in static context.

The risk: over-indexing on either approach. Builder.io's framework treats them as complementary: rules for invariants, skills for task-specific playbooks, commands for workflows. The pattern scales when quantity of capabilities grows.

Worth noting: This applies to deterministic knowledge. For dynamic data or third-party APIs, skills and RAG remain necessary. The question is whether you're building a skill because the agent needs to do something, or because you're trying to make it remember something.

The research is specific to coding agents but the principle - reducing decision points for AI systems - generalizes. Every "should I check this?" moment is a failure point.

The Pattern

The Data

What This Means

The Enterprise Angle

Related Articles

Enterprise RAG deployments hit measurement gap as retrieval becomes critical infrastructure

Stack Overflow opens chat to all users, ships AI Assist speed boost

Fluid Protocol stablecoin looping costs detailed - Part 1 of new DeFi analysis