Roko's Basilisk
Posts
The Agent That Eats Your Budget

The Agent That Eats Your Budget

Plus: Washington reverses its Anthropic ban, Korea's $1.5T chip bet, Altman's biopic finds a home.

Roko's Basilisk
July 02, 2026

Here's what's on our plate today:

🧪 Why AI's heaviest users are hunting for ways to use it less.
📰 Washington reverses its ban on Anthropic, Korea's $1.5T chip gamble, and the Altman movie finds a buyer.
🛠️ 3 Things Worth Trying: cut your token drain before finance notices.
🗳️ Poll: Would you run the cheapest competent model if it flew a Chinese flag?

Let’s dive in. No floaties needed.

Sponsored by

Early to AI winners averaging +85% in last 18+ months –> Now sharing next investment insights with you

Over the past 19 months, MavSource's founder was early to the AI winners. Sandisk +196%. Micron +150%. Western Digital +129%. Nvidia +69%. Lam Research +63%. Okta +33%. Average across eight positions: +85%.

Now that same signal comes to your inbox daily. MavSource aggregates all major AI newsletters, podcasts, AI labs, earnings calls, investor letters, and GitHub — then summarizes what matters, analyzes the trends, and adds founder commentary on why it may matter. Built for investors, founders, and operators who want to see what's next before it's a headline.

One 5-minute email, every morning. Free. Results as of June 2026.

Subscribe For Free Now

^{Past performance does not indicate future results. MavSource is informational only, not investment advice.}

_{*This is sponsored content}

The Laboratory

TL;DR

The cheapest AI on the market now flies a Chinese flag, and America's biggest AI spenders are the ones bailing first.

Budgets blew up fast: Uber burned through its entire 2026 AI coding budget in four months, and Microsoft began canceling internal Claude Code licenses due to cost.
Agents bill like electricity: Unlike normal software, agents reread entire codebases on a loop, so usage keeps climbing even as per-token prices fall roughly 10 times in two years.
DeepSeek set the floor: A permanent 75% price cut in May 2026 pushed cached-token costs near zero, an architecture no closed Western model can match without open-sourcing its own.
The market is splitting: Commodity automation now chases the cheapest model, often Chinese, while premium reasoning still commands a price, with Gartner pegging 2026 agent spending at $207B.
What's at stake: Whoever owns the workflow, not the model, may capture the value, leaving enterprises to decide if cheap intelligence is worth the geopolitical exposure.

The strange math making AI's best customers want to use it less

For most of the software era, a useful program cost a fortune to build and almost nothing to run. The first copy absorbed all the engineering, and every copy after that was effectively free, which is part of why software became one of the most profitable businesses ever devised. However, as the industry progressed, agentic AI has quietly inverted that arrangement.

An agent, meaning software that pursues a goal across many self-directed steps rather than answering a single question, is expensive to run, and it grows more expensive precisely as it becomes more useful, because usefulness invites more use, and every unit of use is metered. The companies getting the most out of these systems are therefore the ones watching their bills climb fastest, and some of them have started doing something that would have looked absurd a year ago. They are searching for ways to use the technology less or to buy it somewhere cheaper, and the 'somewhere cheaper' increasingly flies a Chinese flag.

Until recently, however, this problem existed mostly as a theoretical concern. Companies knew agentic AI was expensive, but few had operated these systems at a scale where the economics became impossible to ignore. That changed this year, when some of the biggest enterprise AI adopters began publicly admitting that their AI budgets were running ahead of expectations.

Uber President and COO Andrew Macdonald said it has become increasingly difficult to link the company’s growing use of AI coding tools like Claude Code to tangible consumer-facing innovations. Photo Credit: Fortune

The spring the bills came due

The first companies to say this out loud did so this spring. Uber exhausted its entire 2026 AI coding budget by April, four months into the year, after Claude Code spread across its engineering org faster than its finance models had assumed; its CTO, Praveen Neppalli Naga, confirmed the overrun to The Information. On the Rapid Response podcast in late May, reported by Fortune, Uber president and COO Andrew Macdonald said the spending was getting harder to justify because he could not draw a clear line between all that consumption and the more useful features reaching the people who use Uber. On an earnings call, CEO Dara Khosrowshahi said roughly 10% of the company's committed code was now written by autonomous agents. Around the same time, Microsoft began canceling most of its internal Claude Code licenses, partly due to cost, just months after issuing them, according to The Verge.

These companies do not represent reckless adopters. They were doing exactly what the tools invite, and the meter underneath was never built for it.

Why agents eat tokens

To understand why these bills escalate so quickly, it helps to look at what companies are actually paying for. Unlike traditional software, where one purchase unlocks repeated use, modern AI is priced more like electricity. Every request consumes computation, which is measured in tokens.

A token is a small chunk of text, and every model charges by how many it reads and writes. A chatbot reply is cheap because it reads a short prompt and writes a short answer. An agent is a different creature. It reads an entire codebase, takes a step, rereads the context to check itself, takes another step, and loops like that for hours, re-consuming the same material on every pass while the meter runs. Economic forces produced a paradox that the nineteenth century already named: when cheaper coal became available, Britain burned vastly more of it. The inference cost per token has fallen by about 10 times over two years, while total consumption has risen by more than 100 times, which is how the world ends up with cheaper tokens and bigger bills at the same time.

Even the companies selling agents feel it from the other side. Salesforce has rewritten how it charges for its Agentforce agents more than once, drifting away from per-seat licensing toward charging for the work the agents actually complete, a restructuring its leadership has framed as the future of enterprise pricing. When the firm building the product keeps changing how it bills for it, the difficulty is not a quirk of one customer's budget. It is the shape of the thing.

You can only ration what you control

Faced with the climbing meter, companies have split into two instincts. The first is to ration, to use fewer tokens through caps, cheaper model tiers, and tighter engineering, which is what Microsoft's retreat amounts to. Rationing works only if you own the architecture generating the tokens. A company that has built its product on top of someone else's model cannot simply instruct it to think less, and for those firms, the cleaner escape is to stop renting intelligence by the token and run a model they can host themselves.

This is where open-weight models enter, meaning systems whose underlying files are published openly so anyone can download, modify, and run them without paying a per-token fee to a central provider. Airbnb took this route before the budget panic became a headline. Its chief executive told Bloomberg in late 2025 that the company leans heavily on Alibaba's open Qwen model because it is fast and cheap, and uses OpenAI's most advanced models, but not much in production. Which means Airbnb chose the cheaper engine for the work that actually scaled, and that engine was Chinese.

If companies are trying to escape the token meter, then the obvious question becomes where they can go. Increasingly, the answer is not another American frontier model, but an open-weight model from China.

The case for calling it structural

While companies had been working on finding a solution to the question of token use, what turned a trickle into a current toward open-weight models was DeepSeek.

In late May 2026, the Chinese lab made a 75% price cut permanent, converting what looked like a promotion into a standing floor. DeepSeek achieved this by dramatically reducing the cost of cached tokens, previously processed pieces of context that AI agents repeatedly reread as they work through a task. Since these repeated reads account for roughly 80% to 90% of an agent's token usage, lowering their price sharply reduces the overall cost of running AI agents. The company pushed cached-token prices close to zero, a figure one storage executive estimated was roughly 87 times cheaper than comparable Western clouds, crediting an architecture that activates only a sliver of the model per token and compresses the memory cost of holding long context. Because the weights are open, a customer can always fall back to running them in-house, which sets a floor no closed rival can match by merely trimming its own list price. Those economics are beginning to influence real developer behavior rather than remaining theoretical.

On OpenRouter, a marketplace that routes traffic to whichever model a developer chooses, open-weight systems accounted for roughly a third of all token volume by late 2025, with Chinese models driving much of that growth, per the platform's own study. The catch is that the cheapest path runs through Beijing, and that collides with everything a regulated company frets over regarding data, supply chains, and the chance of sudden sanctions. The smaller and nimbler the team, the easier it is to pocket the savings now. The larger and more scrutinized the company, the longer it lingers at the edge of the water.

Two markets where there was one

Those adoption patterns hint at a larger shift than simply choosing one model over another. They suggest that AI itself may be splitting into different economic layers. The thing cheapening toward the cost of electricity is not a side feature but raw machine intelligence itself, the exact capability the frontier labs spent tens of billions of dollars to build. As that layer commoditizes, the value drifts toward whoever owns the surrounding workflow and the institutional knowledge poured into it. The market is splitting into two, as VentureBeat has argued: a commodity layer of background automation where the cheapest credible model wins, and a premium layer of mission-critical reasoning where trust and capability still command a price. Gartner expects spending on agent software alone to reach $207B in 2026, a figure cited by Fortune, so the layer being commoditized is not small.

The question this leaves on the table is less about price than about what a premium actually buys, and whether a Western enterprise is comfortable with the idea that the cheapest competent answer is one it downloads from a Chinese lab and runs on its own machines. That is a different decision from the one most boards thought they were making when they approved an AI budget in the autumn of 2025.

It returns the story to where it began, only sharper. Usefulness and cost have fused, and companies are now choosing their models to escape a bill they understand. The industry's conversation has therefore shifted. A year ago, the question was who had the smartest model. Today, for many enterprises, it is whether that intelligence generates enough value to justify paying for every thought.

That is the question Uber's operating chief could not answer, the one about what all this consumption is genuinely worth. The token was always going to get cheaper. The harder question is whether companies can ever make intelligence cheap enough that its cost disappears into the background, the way software once did. Until then, the biggest users of AI may remain the ones most motivated to use less of it.

Hire smarter with Athyna, save up to 70% on salary costs.

Athyna connects you with top LATAM AI talent, fast

Meet vetted professionals in as little as five days, without long, expensive recruiting cycles.
Save up to 70% on salary costs when hiring AI engineers, product leaders, and data scientists.
Get AI-assisted matching plus human vetting, so your shortlist is tight, and your interviews are worth it.