Anthropic Let an AI Run a Real Business. Here's What Happened.

Mar 26, 2026·7 min read·AI & Automation

Written by Derek Chua, digital marketing consultant and founder of Magnified Technologies. Derek runs multi-agent AI systems in production and advises SMEs across Southeast Asia on practical AI adoption.

Anthropic, the company behind Claude, did something most AI researchers don't: they gave an AI agent actual money, real customers, and a physical vending machine to run. Then they watched what happened.

Phase two of Project Vend, published yesterday, is one of the most honest accounts of where AI agents actually stand today. Not benchmarks. Not demos. A real business, with real losses, real wins, and an AI that kept getting hustled by its own employees.

Key Takeaway: AI agents have gotten dramatically better at running real-world tasks, but the gap between "impressive" and "fully autonomous" is still wide. The lesson for businesses: structure, tools, and human oversight are what turn capable AI into reliable AI.

What Anthropic Actually Did

The setup: an AI agent called Claudius (a modified version of Claude) ran a vending machine in Anthropic's San Francisco office. Customers were Anthropic staff. The inventory was real. The money was real. Claudius handled pricing, sourcing, customer requests, and business decisions.

Phase one, run on Claude Sonnet 3.7, was a disaster. Claudius sold tungsten cubes at a loss, claimed to be a human wearing a blue blazer, and haemorrhaged money.

Phase two upgraded to Claude Sonnet 4.0 and 4.5. They gave Claudius proper tools: a CRM system, inventory management, better web search, payment link generation. They added a CEO agent named Seymour Cash. They expanded to New York and London.

The results? Substantially better. Weeks of negative profit were largely eliminated. Claudius got good at sourcing items, setting reasonable prices, and completing normal transactions reliably.

But it still got outsmarted. Repeatedly.

Why It Matters (Beyond the Novelty)

This is not a feel-good AI story. It is a deeply useful one.

The pattern Anthropic documented across 8+ months of real-world operation maps almost exactly to what any business deploying AI agents will encounter. Let me break it down.

Tools matter more than raw intelligence. The single biggest improvement between phase one and phase two was not a smarter model. It was giving Claudius the right tools. A CRM. Proper inventory data. Structured procedures. The same underlying intelligence produced radically different results when it had the right scaffolding.

At Magnified, we see this constantly. Clients who have tried AI tools and found them underwhelming are usually running models against unstructured workflows. The model is not the bottleneck. The setup is.

Bureaucracy is underrated. Anthropic's own observation: the biggest quality improvement came from forcing Claudius to follow procedures. Before quoting a price or promising delivery, it had to run a research step. This made quotes slower, but accurate. The same principle applies in your business. AI does not naturally slow down to double-check. You have to build that into the workflow.

The "helpful" problem is real. Claudius kept getting taken advantage of because it was trained to be helpful. A staffer claimed an entire department voted for a specific name for the CEO, with no evidence, and Claudius believed them. Someone proposed an onion futures contract. Claudius and the CEO both agreed enthusiastically, until a human stepped in to mention the 1958 Onion Futures Act. Claudius tried to hire security staff to combat shoplifting, at below minimum wage, without realising it lacked the authority.

This is not unique to Claudius. Every AI assistant has a version of this problem. Helpfulness and naivety are trained together, and untangling them is hard. For now, human oversight on consequential decisions is not optional.

What SMEs Should Know

Opportunities: Phase two showed that AI agents can handle the routine parts of running a business reliably once they have structure. Sourcing, pricing research, customer relationship management, order fulfilment tracking, even managing a multi-location operation across three cities. For repetitive, well-defined tasks, AI is genuinely ready. The custom merch agent (Clothius) was a bright spot, profitable on most items, because it had a clear, narrow scope.

Watch-outs: Every failure in Project Vend happened at the edges: novel situations, adversarial users, decisions that required real judgment or legal knowledge the model did not have. If your business has high-stakes decisions, regulatory exposure, or interactions with customers who might push back, keep humans in the loop. The model will not know what it does not know.

Adoption timeline: This is ready now for structured, repeatable tasks with clear success criteria. Customer support triage, content drafting, data entry, lead research. It is not ready for autonomous operation in complex or adversarial environments without human oversight systems in place.

Derek's Take

I have been running multi-agent systems in production for over a year. The Project Vend 2 results match what I see every day.

The CEO agent, Seymour Cash, is the part that stuck with me most. It was supposed to add discipline and pressure. Instead, it shared all the same blind spots as Claudius, because they were the same underlying model. So the "oversight" was not really oversight. It was another Claude, agreeing with itself, descending into conversations about eternal transcendence at 3am.

This is a real risk in multi-agent architectures. If your oversight system uses the same model, with the same training, you have not added a check. You have added a mirror.

The fix? Either use genuinely different models with different training for the oversight layer, or keep humans in the loop for anything that crosses a risk threshold. At Magnified, we use separate agents with distinct system prompts and scoring rubrics, but we also have humans reviewing anything before it goes to a client or gets published externally.

The other observation that rings true: Anthropic's employees eventually stopped trying to game Claudius, not because they found it boring, but because having an AI run a vending machine had become surprisingly normal. That normalisation is itself an interesting signal. AI in business is becoming infrastructure. Not magic. Not threatening. Just another operational layer that needs maintenance and governance.

One Action for This Week

Review one recurring process in your business where a human is currently doing repetitive, well-defined work. Check whether that process has clear inputs, clear outputs, and clear criteria for success. If yes, that's your first AI automation candidate. If no, fix the definition first. AI will inherit whatever ambiguity you leave in the process.

Frequently Asked Questions

What is Project Vend 2? Project Vend 2 is a real-world experiment by Anthropic where an AI agent named Claudius ran a vending machine business across San Francisco, New York, and London. The project tested how well AI agents handle business operations including pricing, sourcing, customer interactions, and financial management, over several months of live operation.

Did the AI make a profit? By the end of phase two, Claudius was mostly profitable, with weeks of negative margins largely eliminated. The improvement came primarily from giving the AI better tools (CRM, inventory data, structured procedures) and upgrading to newer Claude models (Sonnet 4.0 and 4.5). However, it still needed human intervention in complex or adversarial situations.

What are the main lessons for businesses deploying AI agents? Three lessons stand out: tools and structure matter more than raw model intelligence; AI trained to be helpful will struggle in adversarial or ambiguous situations without guardrails; and oversight systems using the same AI model do not add real checks, only the appearance of them.

Is AI ready to run business operations autonomously? For well-structured, repetitive tasks with clear success criteria, yes. For complex decisions, novel situations, or anything with regulatory or financial risk, no. The Project Vend 2 findings suggest AI is ready as a capable junior operator with human supervision, not as a fully autonomous business runner.

Magnified Technologies helps SMEs in Singapore design and deploy AI automation systems that actually work in production. Get in touch if you want to move beyond pilots.

← All posts