Photo by Joshua Reddekopp on Unsplash
When it comes to building and deploying AI at scale, you’d think cost would be the biggest concern. After all, compute isn’t cheap. But according to leaders at fast-growing companies like Wonder and Recursion, cost isn’t what’s keeping them up at night.
Instead, it’s all about speed, scalability, and keeping up with demand. AI is moving incredibly fast—and these companies aren’t waiting around for cloud bills or budget projections to catch up.
Deploy now, optimize later
James Chen, CTO at Wonder—a food delivery and takeout company—recently shared how their AI efforts are surprisingly affordable when you look at the per-order cost. Right now, every meal order includes just 2 to 3 cents worth of AI processing, going up to an expected 5 to 8 cents soon. That’s not what’s breaking the bank.
But Wonder’s real challenge? Running into cloud capacity issues faster than expected.
“The assumption was we’d have unlimited capacity,” said Chen, but the demand told a different story. About six months ago, they started getting nudges from their cloud provider suggesting it was time to expand to another region. Not because of cost—because they were running out of compute and storage.
What does that tell us? In today’s AI landscape, companies are bumping up against infrastructure ceilings before they max out their budgets.
Photo by Martin Hertz on Unsplash
Small models, big dreams
Wonder isn’t just snapping together off-the-shelf tools either. They’ve built custom AI models to personalize restaurant recommendations and logistics tracking. Long-term, they want to move toward micro-models—one tiny, hyper-personalized agent per customer. Think AI concierges based on your order history and browsing.
But there’s a catch: right now, it’s just too expensive to spin up that kind of individualized model for every user.
So, for now, they’re sticking to broader—but still efficient—large models. And to keep things running smoothly, Wonder gives its developers and data scientists lots of freedom—to try new tools, test big ideas, and occasionally blow a budget or two.
They manage that with regular internal cost reviews. But as Chen puts it, predicting budgets in a token-based system is “definitely art versus science.”
One surprising cost driver? The repeated context sent to large language models in every request. “Over 50%, up to 80% of your costs is just resending the same information,” said Chen. So they’re working on smarter ways to manage that.
Recursion’s ‘vindication moment’
Over in biotech, Recursion has taken a different path. Ben Mabey, the company’s CTO, explained that when they first started building their AI infrastructure back in 2017, the cloud just wasn’t ready.
So they went on-prem—building their own cluster using gaming GPUs like Nvidia’s 1080s. They’ve since scaled up to more powerful hardware like H100s and A100s, running massive training jobs while mixing cloud compute for smaller, short-term workloads.
And that early decision paid off.
At one point, when Recursion asked cloud providers for more capacity, the answer was, “Maybe in a year.” Building it themselves turned out to be the faster, cheaper solution.
Photo by Anatolii Bazarov on Unsplash
On-prem vs cloud: When and why it matters
Recursion runs training for its massive image repository (think petabytes of biological data) on premises because that’s more efficient for connected, multi-node workloads. Smaller inference jobs? Those go to the cloud.
Their hybrid approach isn’t just about performance—it’s also about saving money. Moving big training workloads on-prem, Mabey said, is conservatively “10 times cheaper.” Over five years, that could mean cutting total cost in half.
But even with savings on the table, Mabey points out something deeper: companies that hesitate to invest in compute end up paying more in the long run—both in money and innovation.
“If teams are scared of cloud bills, they use less compute. And that limits experimentation,” he said.
So what’s the takeaway?
The AI conversation is shifting. For companies deploying at scale, it’s not about whether they can afford AI. It’s about whether they can move fast enough.
- Wonder learned the hard way that cloud capacity isn’t limitless.
- Recursion proved that early investment in infrastructure can pay off—big.
- Both companies are prioritizing flexibility, experimentation, and speed over micromanaging costs.
In the world of ambitious AI, the real bottleneck isn’t your cloud bill. It’s how bold you’re willing to be.
Keywords: AI deployment, AI scalability, cloud vs on-prem, AI cost optimization, infrastructure for AI, AI models in production, data science budgeting, Wonder AI, Recursion biotech, AI engineering decisions