Photo by Luca Bravo on Unsplash
A new AI model just landed, and it might make your current code buddy look a little sleepy.
Anthropic has officially launched Claude Sonnet 4.5, calling it their most capable AI yet. If you’re a developer, this one’s worth watching. Not because it writes slightly better function names — we’re talking serious upgrades in focus, coding ability, and computer use. In fact, Anthropic says Sonnet 4.5 worked continuously on complex, multistep tasks for over 30 hours without losing the thread. That kind of task endurance is rare with today’s AI models.
So, what’s new with Claude Sonnet 4.5?
Here’s the quick rundown:
- It’s the mid-tier option in Anthropic’s Claude family (smaller than Opus, bigger than Haiku).
- It’s now leading several industry benchmarks — outperforming even OpenAI’s latest and Google’s recent release.
- It can stay focused on one big task for an impressively long time.
Let’s break it down.
It’s seriously good at coding
Claude Sonnet 4.5 is being billed as “the best coding model in the world” on Anthropic’s own site. That’s a bold claim — but it has some numbers to back it up:
- 77.2% on SWE-bench Verified, a benchmark for real-world coding skills.
- 61.4% on OSWorld tasks, which involve general computer skills. (That’s up from 42.2% in version 4.0.)
- Outperforming OpenAI’s GPT-5 Codex (74.5%) and Google’s Gemini 2.5 Pro (67.2%) in these categories.
Developers who’ve had early access, like Simon Willison, say Sonnet 4.5 feels like a better model for coding than anything they’ve used recently — even beating out his go-to GPT-5-Codex.
Photo by Nandha Kumar on Unsplash
It’s not just smart — it’s also patient
Multi-hour focus isn’t something you usually see with AI models. They tend to get lost midway through long projects as their “short-term memory” — known as the context window — fills up.
But Anthropic says Sonnet 4.5 stayed locked in for over 30 hours on complex tasks. That’s a step forward for agentic models, which are often tasked with following workflow chains or handling big data refactors. Previous versions could only manage about a day of continuity. Thirty-plus hours? That’s a big deal.
New tools for developers, too
Alongside the model, Anthropic dropped a few dev-centric perks:
- Claude Code 2.0, a command-line agent that now supports checkpoints (you can roll back to earlier states) and includes a VS Code extension.
- Claude Agent SDK, which lets devs build their own AI assistants.
- Better file handling, like generating docs, spreadsheets, and slides without leaving the chat interface.
- Long-memory support, through features like context editing for extended tasks.
Photo by Mrg Simon on Unsplash
You can access Sonnet 4.5 via API today using the identifier claude-sonnet-4-5, with the same pricing as its predecessor: $3 per million input tokens, $15 per million output tokens.
It’s getting better at math and finance, too
Sonnet 4.5 isn’t just about writing cleaner Python. It also scored high across knowledge and reasoning tasks:
- 92% on Vals AI’s FinanceAgent benchmark (designed to mirror entry-level financial analyst work).
- Solid gains on the AIME 2024 math test and improved multilingual knowledge on the MMMLU benchmark.
That kind of well-rounded performance suggests it’s not just a one-trick code pony.
And yes, it’s a little less… weird
One of the less-talked-about upgrades: Anthropic says Sonnet 4.5 shows a lower tendency toward:
- Sycophancy (just agreeing with users blindly)
- Deceptive or manipulative outputs
- Power-seeking responses
- Delusions or fantasy-enabling feedback
And if you’ve ever gone a little too deep talking to a chatbot about your imaginary startup on Mars, you know why that matters.
Imagine what it could do in the right hands
For Max subscribers, there’s even a five-day preview experiment called “Imagine with Claude” that shows Sonnet 4.5 writing and running software in real time. Think of it as a little lab where you can watch the model build and test as it goes.
It’s too soon to say how Sonnet 4.5 will hold up with wider use — benchmarks are self-reported, and even the best models stumble in strange ways once they hit the real world. But so far, it seems like a clear step up for Anthropic and for AI-assisted development in general.
Photo by Cris DiNoto on Unsplash
If you’re someone who builds software — or even just someone who likes watching AI get better at it — Claude Sonnet 4.5 is worth trying. Especially if your current model keeps forgetting what you just told it 20 minutes ago.
Keywords: Claude Sonnet 4.5, AI coding model, Anthropic AI, Claude Code 2.0, Claude Agent SDK, AI model benchmarks, OSWorld, SWE-bench, AI developer tools, AI staying focused