Roko's Basilisk
Posts
Together AI’s Middle Layer

Together AI’s Middle Layer

Plus: restaurant robot chaos, UK quantum push, and human-proof shopping agents.

Roko's Basilisk
March 19, 2026

Here’s what’s on our plate today:

🧪 Together AI and the fight to make open models usable at scale.
⚡ Restaurant robot chaos, UK quantum push, human-proof shopping agents.
🧰 Three Things Worth Trying: Together AI, vLLM, and Baseten.
🗳️ Poll on where the real long-term value in AI infrastructure sits.

Let’s dive in. No floaties needed…

Label Faster. Train Smarter. Ship Better Models.

Multimodal models are only as good as the data behind them. Our trainers work across text, audio, and image pipelines—handling transcription, labeling, and annotation tasks with speed and consistency.

Audio transcription and speech data labeling
Image and video annotation for vision model pipelines
Multilingual coverage across Portuguese, Spanish, and English

Part of our vetted LATAM talent network, working in U.S.-aligned time zones.

_{*This is sponsored content}

The Laboratory

How Together AI is bridging the gap between models and the real world

TL;DR

The middleman thesis: Together AI provides the infrastructure that lets enterprises run open-source AI models at production speed without managing their own GPUs or engineering stacks.
The gap is real: DeepSeek-R1 is free to download but has 671B parameters and needs multiple servers to function. Together AI serves it at 85 tokens per second.
Research as a moat: The company’s co-founder created FlashAttention, an algorithm now used by virtually every major AI model in production. That research pipeline, not just GPU access, is what separates Together AI from commodity cloud providers.
Building beyond speed: Two acquisitions in six months (CodeSandbox for code execution, Refuel.ai for data quality) signal a deliberate push to own more of the workflow between procuring AI and running it in production.
The squeeze is coming: NVIDIA’s $20B Groq deal and hyperscaler investments in open-source model hosting are closing in from above. Together AI’s survival as an independent category leader depends on whether its research keeps outpacing the giants’ ability to replicate or absorb it.

Founded in 2022, Together AI is building an open-source AI cloud used by over 450k developers and companies, including Salesforce, Zoom, SK Telecom, and The Washington Post, aiming to deliver high-performance AI without reliance on proprietary platforms. Photo Credit: TFN

The relationship between large enterprises and end users depends heavily on the mediators that facilitate the flow of goods and services from the former to the latter. Take, for example, a simple bag of potato chips; the company that produces and packages the chips relies on intermediate wholesalers and retailers to ensure the product reaches the end user. This creates an interdependent system that relies on the smooth functioning of different layers to keep it going.

A similar relationship also exists in the AI industry, where a layer of enterprises enables customers to make the most of models developed by frontier labs.

Today, when people think about technology in businesses, they picture model makers like OpenAI, Google, and Anthropic. What they often overlook is the messy, practical process of getting AI to actually work inside real organizations, which depends on a different kind of company altogether.

Together AI belongs to the category of companies that build and maintain the infrastructure that lets developers and businesses take open-source AI models, the freely available alternatives to proprietary systems, and run them at production speed and scale. These companies are solving a critical problem in the AI ecosystem by providing the real-world plumbing that brings the power of frontier AI tools to enterprises, who can then use them as needed.

Download vs. deploy

To understand what Together AI does, it helps to look at a key tension in the AI industry. On one side are powerful open models like Meta’s Llama and China’s DeepSeek-R1 that anyone can technically download and use. On the other side are companies that want to use these models but don’t have the computing power, engineering teams, or infrastructure needed to run them reliably.

As more open-source models continue to enter the market and more businesses look to AI to solve complex problems, the gap has grown larger than most people realize. DeepSeek-R1, for instance, is one of the most capable open-source models available today, but it has 671B parameters, which means it must run across multiple servers just to work. In the words of Together AI CEO Vipul Ved Prakash, it is a fairly expensive model to run inference on, meaning that downloading the model is easy; running it reliably and at production speed is much harder.

Together AI is trying to solve this problem by supporting over 200 open-source models across text, image, audio, video, and code, accessible through APIs that handle all the underlying hardware complexity.

According to the company, developers can call a model as they would any cloud service, without managing GPUs, configuring networking, or optimizing memory allocation. It further claims that its inference speeds are two to three times faster than hyperscaler alternatives and, in some cases, dramatically more. This speed is driven by the company’s approach of going beyond being a cloud service provider and conducting AI research.

Speed from the lab

One of Together AI’s most significant contributions to the research space is FlashAttention, an open-source algorithm created by co-founder and Chief Scientist Tri Dao. To understand what it does, consider that every time an AI model processes text, it must determine how each word relates to every other word in the input. This process, called attention, is the most computationally expensive part of running a model. The longer the text, the worse it gets, because the computing cost grows exponentially with length. FlashAttention solves this by reorganizing the math so that data stays in the chip’s fastest cache rather than being shuffled back and forth between slower memory. The result is that memory usage scales linearly rather than exponentially, enabling faster training, handling longer text, and significantly lower costs.

Today, virtually every major AI model in production uses FlashAttention, and its fourth version, announced at Together AI’s AI Native Conf in early 2026, was redesigned from scratch for NVIDIA’s latest Blackwell GPUs, with new techniques that further push inference performance.

This research-to-production pipeline, where breakthroughs in the lab translate directly into speed and cost advantages on the commercial platform, is Together AI’s core moat. It is also something that most competing GPU cloud providers cannot easily replicate, as they lack the research talent to develop innovations at this level.

Owning the workflow

However, despite the research, companies like Together AI operate in a space open to disruption from hyperscalers like AWS or Google Cloud, which can provide the same cloud infrastructure at lower cost. And even if the research that helps speed up the process makes them stand out, it is only a matter of time before competitors catch up.

Together AI appears to recognize this risk, and its recent moves suggest a deliberate strategy to build beyond pure inference hosting. In December 2024, it acquired CodeSandbox, an Amsterdam-based developer platform with over 4.5M monthly users, to add code execution capabilities directly into its inference platform. This matters because AI agents (software that can take actions autonomously rather than just answer questions) need the ability to write and run code in secure environments. By integrating CodeSandbox, Together AI positioned itself for the agentic AI wave before most competitors.

In May 2025, it acquired Refuel.ai, a data quality startup founded by Stanford alums, to address one of the most persistent bottlenecks in enterprise AI: the messy, unstructured state of most business data.

Refuel’s tools transform raw data into clean, structured datasets that AI models can actually use, with the company claiming 50% fewer errors than existing approaches on certain tasks. A Gartner analyst told TechTarget that it was a ‘complementary acquisition’ that gives Together AI interesting data assets on top of its infrastructure foundation.

The pattern is clear, Together AI is trying to own more of the workflow that sits between a developer deciding to use AI and that AI actually running in production.

The squeeze ahead

Together AI’s trajectory raises a question that extends beyond any single company. As AI shifts from experimental technology to enterprise infrastructure, a fundamental tension is emerging over where the value actually sits: with the companies building the models, the cloud providers hosting them, or the infrastructure layer in between that makes them work?

The company has clearly placed its bet with $534M in total funding, a $3.3B valuation, and revenue growing faster than most enterprise software companies; it is positioned as a serious player in the AI infrastructure market.

However, the road ahead is not without obstacles. NVIDIA’s $20B deal to acquire Groq’s inference technology and engineering talent in late 2025 signals that the chip giant is moving aggressively into the inference optimization space, the same space where Together AI competes.

Hyperscalers like AWS and Google Cloud are investing heavily in serving open-source models on their own platforms, with distribution advantages that a startup cannot easily match.

In this situation, Together AI’s strongest advantage may be its research pipeline, which has produced technologies such as FlashAttention. These give it real technical benefits that competitors must either license or replicate. If the company keeps turning its research into faster and cheaper AI performance, it maintains a head start that money alone cannot easily buy.

The AI industry’s supply chain is starting to look a lot like those of other mature industries. The company that builds the product is not always the one that gets it to the customer. Just as that bag of chips needs wholesalers, distributors, and retailers to travel from a factory floor to a store shelf, AI models need an infrastructure layer to travel from a research lab to an enterprise workflow. Together AI and companies like it are building that layer. The question is no longer whether these middlemen are necessary. It is whether they can stay independent long enough to define the category before the giants above them decide to absorb it.

Quick Bits, No Fluff

Restaurant robot meltdown: A dancing restaurant robot in San Jose smashed plates and chopsticks until staff physically dragged it away, raising fresh questions about basic safety controls.
UK backs quantum hard: The UK is pledging £1B to keep quantum talent at home after falling behind the U.S. in AI, a blunt sign that governments now see frontier tech as a retention war.
Proof-of-human shopping: World launched a verification tool for AI shopping agents, trying to prove a real human is behind automated purchases before agentic commerce turns into fraud spam.

The context to prepare for tomorrow, today.

Memorandum merges global headlines, expert commentary, and startup innovations into a single, time-saving digest built for forward-thinking professionals.

Rather than sifting through an endless feed, you get curated content that captures the pulse of the tech world—from Silicon Valley to emerging international hubs. Track upcoming trends, significant funding rounds, and high-level shifts across key sectors, all in one place.

Keep your finger on tomorrow’s possibilities with Memorandum’s concise, impactful coverage.