The AI Wrapper Era Is Over. The Age of Claude Agent Wrappers Has Begun.
Anthropic's Tariq Shihipar breaks down what it actually takes to build production-grade agents - from Bash-first tooling to file-system-driven context engineering.
Tariq Shihipar, the lead behind Claude Code at Anthropic, ran a 90-minute workshop on building production-grade agents. Since Manus, interest in agents has surged, but how to build agents that actually work in production has remained frustratingly vague. This workshop was Anthropic’s direct answer to that question.
Beyond services that merely wrap an LLM API, what needs to change when designing agent-native applications? Four takeaways stood out.
Bash Is the Most Powerful Tool
You don’t need dozens of custom tools.
Software that already exists on Linux, including ffmpeg, jq, curl, and more, can handle most tasks when composed through Bash commands. Agents teach themselves how to use these tools by reading man pages and --help output, which means you don’t need to stuff every tool specification into the prompt. Less context window waste, same capability.
The implication: instead of building bespoke integrations for every capability, you hand the agent a shell and let it compose existing software. The entire universe of CLI tools becomes the agent’s toolbox without any of them needing to be registered in advance. The catch is that Bash access requires careful sandboxing; a misconfigured environment can give an agent unintended reach.
The Core of the Agent Loop Is Verification
Gather Context. Take Action. Verify Work.
The criterion for whether to use an agent is simple: can you verify the output? Code is easy to verify with compilers and linters. Research tasks require designing verification logic separately, like requiring source citations with every claim.
This is the insight most teams miss. They focus on making agents smarter when they should be making agents more verifiable. A mediocre model with strong verification loops will outperform a brilliant model with none. Deterministic tools, such as file existence checks, syntax validation, and type checking, belong inside the loop to catch hallucinations before they propagate.
Even Non-Dev Work Gets Solved Through Code Generation
Even simple tasks like checking the weather or analyzing emails are better handled with code than text responses. The approach: let the agent write scripts on the fly to connect multiple APIs and process data.
A significant portion of Claude Code users are in non-dev roles, including marketing, finance, and operations. Treating data analysis and repetitive tasks as disposable code, scripts written once, run once, discarded, is becoming a standard workflow.
This reframes what “coding” means in the agent era. The agent doesn’t need a pre-built integration with your email provider. It writes a script that calls the API, filters the data, and returns results, all generated at runtime.
Context Engineering Lives in the File System
Beyond prompt engineering, you need to design the environment the agent works in.
Giving an agent new capabilities isn’t about complex fine-tuning. It’s about handing it a folder with well-written markdown files and scripts. Tariq described this as being “file system pilled.” Agents have state, and the core of agent architecture is a sandboxed environment where the agent has access to a file system and can execute Bash commands.
The file system is the agent’s long-term memory, reference library, and workspace in one. A CLAUDE.md file at the project root isn’t just documentation; it’s the agent’s onboarding guide. A scripts/ directory isn’t just utilities; it’s the agent’s toolkit.
The Paradigm Shift
Just as web development moved from jQuery to React, from imperative DOM manipulation to component-based architecture, agent development is moving from raw prompt calls to structured frameworks.
The question is no longer “what should I ask?” It’s “what permissions and environment should I provide?” Teams that understand this distinction, that agent performance depends more on the system around the model than the model itself, are the ones building software that holds up under real workloads.
Based on Tariq Shihipar’s workshop at Anthropic.
Join the newsletter
Get updates on my latest projects, articles, and experiments with AI and web development.