Roko's Basilisk
Posts
The AI Plumbing Boom

The AI Plumbing Boom

Plus: SI's AI controversy, Claude gets drug discovery tools, AI law tightens.

Roko's Basilisk
May 20, 2026

Here’s what’s on our plate today:

🧪 Understanding the plumbing behind the AI gold rush.
📰 Sports Illustrated's AI scandal, SandboxAQ meets Claude, a new chapter in AI law.
🧠 Brain Snack: Build a routing layer before lock-in becomes a margin crisis.
🗳️ Poll: Who wins the AI inference gold rush?

Let’s dive in. No floaties needed…

Launch fast. Design beautifully. Build your company's website on Framer.

Framer helps teams design, build, and launch their marketing sites lightning fast. With the ability to publish hundreds of CMS pages in a single click, operate at a global scale with seamless localization, and even host unified content across multiple domains, teams have never been able to ship faster.

Trusted by companies like Miro, Bilt, and Perplexity.

_{*This is sponsored content}

The Laboratory

TL;DR

Inference is the real cash cow: Together AI tripled annualized revenue to roughly $1B by early 2026 and is targeting a $7.5B valuation. It does not build models. It optimizes the infrastructure that runs them.
Open source is pulling enterprises in: The open-source AI model market is projected at $23B in 2026. Enterprises account for 69% of adoption because they want customization, data control, and freedom from vendor lock-in.
The field is getting crowded: Groq, Fireworks AI, Cerebras, and CoreWeave are all raising massive rounds. Low switching costs between open-source providers could keep the market fragmented far longer than traditional cloud did.
Hyperscalers loom large: US cloud providers are projected to spend $600B on AI infrastructure in 2026. If they close the performance gap, billion-dollar inference startups will struggle to compete on convenience and scale.

Understanding the plumbing behind the AI gold rush

Somewhere between the AI model and the answer it gives you, there is a layer of infrastructure that almost nobody talks about. It is not the model itself, not the training data, not the research lab. It is the system that takes a trained model and makes it respond to a query in under a second, millions of times a day, without falling over. This layer, loosely called the inference infrastructure, has quietly become one of the fastest-growing segments of technology and is gaining ground in both importance and finances.

In March 2026, The Information reported that Together AI, a cloud AI infrastructure company, was in talks to raise approximately $1B at a $7.5B pre-money valuation, more than doubling its $3.3B mark from February 2025. The company’s annualized revenue had reportedly tripled to roughly $1B since mid-2025.

As AI infrastructure demand surges, Together AI is reportedly scaling rapidly, with its valuation rising toward $7.5B and annualized revenue reaching about $1B. Photo Credit: Together AI.

Together AI does not build its own flagship AI model or operate a consumer chatbot. Instead, the company focuses on running open-source AI models faster and more cheaply than most competing infrastructure providers. This is where the industry is witnessing a gold rush.

The GPU landlord model

To understand Together AI’s business, you need to understand a distinction that matters enormously inside the AI industry but barely registers outside it. Training is the expensive, one-time process of teaching a model by feeding it enormous amounts of data. Inference is what happens every time a trained model responds to a question or generates an image. Training gets the headlines, but inference is where the ongoing costs accumulate, and it is the part of the AI stack that Together AI has built its entire company around.

Together AI operates a cloud platform that gives developers API access (a way for software to request services from another system) to over 200 open-source models. The core product is not the models themselves, which are freely available, but the optimized software and hardware that runs them. The company’s Chief Scientist, Tri Dao, created FlashAttention, a technique that reorganizes GPU memory access during computation to eliminate wasteful data movement.

In practice, it enables AI models to run significantly faster without changing the model itself. FlashAttention is now used by virtually every major AI lab, but Together AI has the advantage of employing its creator and building successive versions directly into its commercial inference engine.

The revenue model works across three tiers: pay-per-token API calls starting at $0.10 per M tokens for smaller models, dedicated GPU servers starting at $3.99 per hour, and custom training contracts. The company is also increasingly buying its own GPU servers rather than leasing from cloud providers, a move toward vertical integration that improves margins but increases capital requirements.

The company’s reported growth trajectory, from roughly $30M in annualized revenue in early 2024 to an estimated $1B by early 2026, has been remarkable. Private markets research firm Sacra estimated the figure in February 2026, though the number is not audited and may vary across tracking firms depending on methodology and timing.

This growth trajectory was possible because Together AI operates in an area that is undergoing structural shifts as AI adoption continues to grow.

Why is the money flowing?

The investor thesis behind Together AI rests on a structural shift in how enterprises are adopting AI. For the first two years of the generative AI boom, most companies defaulted to proprietary APIs from OpenAI or Google.

However, that model is now giving way to one in which companies are looking to open-source solutions for their business functions.

According to a market research report from the Business Research Company, the open-source AI model market is projected at $23B in 2026, growing at 21.1% annually, with enterprises accounting for nearly 69% of adoption.

The reason behind this shift is straightforward: open-weight models (models whose internal parameters are publicly available for anyone to use and modify) let companies customize AI for their specific needs, avoid vendor lock-in, and keep sensitive data on their own infrastructure.

In the case of Together AI, however, the investor base reveals a larger strategic shift. Alongside US venture firms like General Catalyst and Kleiner Perkins are strategic investors, including NVIDIA, Salesforce, and Prosperity7 Ventures, the venture arm of Saudi Aramco. Prosperity7 has steadily expanded its presence across the AI inference ecosystem, backing companies like Groq and participating in multiple funding rounds for Together AI.

This is part of a broader strategy in which the kingdom aims to gain a greater share of AI’s global physical infrastructure.

The kingdom is building GPU-dense data center capacity designed to support global AI workloads and has separately committed $1.5B to expand Groq’s inference footprint in the region.

The crowded middle

Together AI operates in an increasingly crowded AI inference market, one that has attracted enormous capital and a growing roster of well-funded competitors approaching the problem from very different technical directions. Groq built its own custom inference chip, the Language Processing Unit, and raised $750M at a $6.9B valuation in 2025.

Fireworks AI focused on software reliability and production inference infrastructure, reaching a $4B valuation after raising $250M the same year. Cerebras Systems pursued wafer-scale processors optimized for raw throughput, while CoreWeave concentrated on large-scale GPU infrastructure, securing more than $25B in capital commitments. According to corporate spending data from Ramp, Together AI and Fireworks AI each account for roughly 10% of AI infrastructure spending among tracked customers, ahead of CoreWeave and Nebius.

The deeper question raised by this influx of competitors is whether AI inference infrastructure will eventually consolidate the way cloud computing did. Early cloud markets narrowed from dozens of providers to a handful of dominant players because switching costs became prohibitively high once companies had invested deeply in ecosystems like AWS or Azure. Open-source AI models could disrupt that pattern.

A company running models like Meta’s Llama or Mistral AI on Together AI can, in theory, migrate those workloads to Fireworks, Groq, or even its own infrastructure without retraining models or rebuilding applications from scratch. If those switching costs remain low, the inference market may stay fragmented far longer than traditional cloud computing, benefiting customers while making it harder for any single provider to build a durable moat.

What the giants are doing

The complication for every company in this category is that AWS, Google Cloud, and Microsoft Azure are investing at a scale that dwarfs the entire startup cohort combined. U.S. cloud providers are projected to spend $600B on AI infrastructure in 2026, doubling their 2024 spending, according to the Council on Foreign Relations. The hyperscalers are building custom AI chips (Google’s TPUs, Amazon’s Trainium, Microsoft’s Maia), expanding GPU capacity, and offering their own optimized inference services. Their advantage is not necessarily technical superiority on any single workload but rather the gravitational pull of existing enterprise relationships and integrated ecosystems: for a company already running on AWS, adding inference through Amazon’s own services requires less organizational friction than onboarding a new vendor.

Together AI and its peers are betting that purpose-built AI infrastructure will consistently outperform general-purpose clouds on inference workloads, and that the performance and cost gap will be wide enough to justify the friction of a separate vendor relationship.

The hyperscalers are betting that most enterprises will prioritize convenience and integration over marginal gains in speed or price. Both positions have historical precedent, and neither has been definitively tested in this market.

The AI inference economy is growing so fast that all these companies are expanding simultaneously, a condition that depends on demand continuing to outstrip supply. If demand growth slows or if the hyperscalers close the performance gap, the billion-dollar infrastructure startups will face a very different competitive environment.

Whether a company built on research breakthroughs and software optimization can hold a $7.5B valuation against competitors spending hundreds of billions on the same problem sits at the intersection of hardware economics, software lock-in dynamics, and how quickly open-source models close the remaining quality gap with proprietary ones. That intersection is where the next few years of the AI industry will be decided.

Brain Snack (for Builders)

If your AI product runs on a single inference provider, you’re one price hike away from a margin crisis. Build a routing layer that lets you swap providers per-workload. The switching cost is low today, and it won’t stay that way.

Be the future of AI

Hunting for AI roles eats hours. Careers pages, group chats, internal referrals, a dozen job boards.

The Athyna AI Job Board does the watching for you. We track openings at frontier AI companies, match them to your profile, and notify you when a role hits a 75% matching index.

Set up a profile and let the matches come to you.