When the World's Greatest Programmer Admits AI Surprised Him, Listen Closely

Mar 5, 2026·8 min read·AI & Automation

"Shock! Shock! I learned yesterday that an open problem I'd been working on for several weeks had just been solved by Claude Opus 4.6."

That is not a quote from a startup founder trying to sell you something. It is from Donald Knuth, the 87-year-old Stanford professor who wrote "The Art of Computer Programming" — the textbook series that defines how computer scientists think about algorithms. He invented the TeX typesetting system. He is, by most accounts, one of the most rigorous and methodical minds in the history of computing.

When Knuth says "I'll have to revise my opinions about generative AI one of these days," he does not mean maybe. He means it happened.

Key Takeaway: When someone with Donald Knuth's skepticism and credentials publicly revises their opinion on AI, it signals a genuine capability shift — not a marketing cycle. The same reasoning power that solved his maths problem is available to every business right now.

Written by Derek Chua, digital marketing consultant and founder of Magnified Technologies. Derek runs multi-agent AI systems in production across client accounts in Singapore and the region.

Why This Is Different From Normal AI Hype

Every week, someone publishes a piece about how AI is transforming everything. Most of it is noise. This is different for one simple reason: the source.

Knuth has been famously careful about AI claims throughout his career. He is not a person who gets swept up in hype. He published a paper — titled "Claude's Cycles" — documenting how Claude Opus 4.6 solved a discrete mathematics problem he had been working on himself. The paper is 16 pages of academic rigour. He called it "a dramatic advance in automatic deduction and creative problem solving."

For Knuth to put something like that in a published paper is significant. Academics do not casually stake their reputation on hype.

Simon Willison, the developer and tech commentator who surfaced this story, tagged it with "november-2025-inflection" — his shorthand for a broader shift he has been tracking. The November 2025 models (Claude Opus 4.6, GPT o3, Gemini 3.1 Pro) are qualitatively different from what came before. Not incrementally better. Genuinely different in their ability to reason through complex, multi-step problems.

Knuth's experience is one data point. But it is a very loud one.

What AI Reasoning Actually Means for Your Business

Here is the thing: you are probably not trying to solve open problems in discrete mathematics. You are trying to run a business, serve clients, manage a team, and make better decisions faster.

So what does "dramatic advance in automatic deduction and creative problem solving" mean for you?

It means the problems you have been dismissing as "too complex for AI" deserve a second look.

Not the rote tasks. Those were already automated. The interesting question now is the judgment layer — the analysis that used to require a senior person, the synthesis that required someone who could hold many variables in their head at once, the draft strategy that required genuine understanding of your business context.

The reasoning models are getting surprisingly good at all of that.

For SMEs, this matters more than it does for large enterprises. A bigger company has analysts, strategists, and specialists. A smaller one has the owner, maybe a small team, and a budget that does not stretch to hiring more expertise. AI reasoning levels the field in ways that raw text generation never did.

What I Have Seen in Production

At Magnified, we have been running multi-agent AI systems in production for the better part of a year. The shift that happened in late 2025 is real, and we felt it in our own work.

Tasks that previously required careful human oversight — writing a full strategy brief, analysing campaign performance across multiple channels, identifying patterns in client data — now come back from the agents with outputs that are genuinely useful on the first pass. Not perfect. Still requires review. But the quality of the starting point changed.

The most telling sign is how our editing process has shifted. A year ago, we were rewriting 60-70% of AI outputs. Now, for the right tasks with the right models, we are editing 20-30%. The reasoning is tighter. The logic holds together better. The answers actually address the question rather than dancing around it.

That is what Knuth saw, in his domain. And he was surprised by it.

The Skeptic Signal

Here is what I find most interesting about the Knuth story. It is not that AI solved a maths problem. AI has been doing impressive things for a while now.

It is that a genuine skeptic changed his mind.

There is a category of AI convert that is very easy to dismiss: the enthusiast who was always going to believe, who sees every demo and finds it magical. Their conversion tells you nothing. They were already sold.

Then there is the skeptic: the person who demands evidence, who does not take vendor claims at face value, who has thought carefully about what these systems can and cannot do. When that person revises their opinion, you should pay attention.

Knuth is the platonic ideal of a rigorous skeptic. And he just publicly revised his opinion.

If you have been holding off on taking AI seriously because you have tried it and found it underwhelming, that is fair. But if the tools you tried were GPT-3 era or even early GPT-4, you have not tried what exists today. The gap between the 2023 models and the late 2025 reasoning models is not a marginal improvement. It is a different product.

One Thing to Do This Week

Pick a problem in your business that you have assumed AI could not handle. Not a simple writing task — something that requires actual judgment. A strategic decision, a complex analysis, a situation where you normally need to think hard before forming an opinion.

Take it to Claude Opus 4.6, enable extended thinking, and give it real context. Not a one-line prompt — explain the situation the way you would explain it to a smart colleague.

Then see what comes back.

You might be surprised. And if Knuth's experience is anything to go by, your surprise will be the productive kind.

Frequently Asked Questions

What is Claude Opus 4.6 and why is it different from earlier AI models? Claude Opus 4.6 is Anthropic's hybrid reasoning model, released in early 2026. Unlike earlier models that primarily pattern-matched text, it uses extended reasoning to work through multi-step problems more like a person would — holding context, checking its own logic, and arriving at answers through actual deduction rather than association. This is what allowed it to solve a novel maths problem that Knuth had been working on for weeks.

Does this mean AI can now replace expert thinking in my business? Not quite — but the bar for what AI can assist with has risen significantly. The more accurate framing is that AI can now take a first serious pass at problems that previously required expert judgment, freeing up your actual experts to review, refine, and decide rather than starting from scratch. Think of it as a capable junior analyst who is getting faster and sharper, not a replacement for experience.

Which AI tools should my business test for reasoning tasks? The main options for reasoning-capable models right now are Claude Opus 4.6 (with extended thinking), OpenAI o3, and Gemini 3.1 Pro with thinking enabled. For most business tasks, Claude and Gemini also have cheaper Flash-tier models that are good enough for many jobs. The key is to match the model to the complexity of the task — not every question needs the most powerful (and expensive) reasoning model.

How do I know if an AI output is actually good reasoning or just confident-sounding nonsense? This is the right question to ask. Test it on problems where you already know the answer, or where you have enough expertise to evaluate the output critically. Ask the AI to show its reasoning, not just its conclusion. If the logic holds up when you trace through it, you can start trusting it on problems where you are less certain. Never deploy AI reasoning on high-stakes decisions without a human reviewing the chain of logic, not just the final answer.

Is this kind of AI available to small businesses, or only large enterprises? It is available to anyone. Claude Opus 4.6 is accessible via Anthropic's API and Claude.ai subscription. Pricing for API use is higher for the powerful reasoning models, but for most SME use cases you are talking about cents per task, not dollars. The real cost is time spent learning how to prompt effectively and integrate the tools into your workflow — not the model fees.

Derek Chua is the founder of Magnified Technologies, a digital marketing agency that uses AI agents in production across strategy, content, and analytics. He writes about AI adoption for business leaders and their teams.

← All posts