Roko's Basilisk
Posts
ElevenLabs Goes Off-Cloud

ElevenLabs Goes Off-Cloud

Plus: Anthropic security story, Apple's CEO trap, and AI as listening tutor.

Roko's Basilisk
April 24, 2026

Here’s what’s on our plate today:

• 🧪 ElevenLabs takes voice AI off the cloud.
• 📰 Anthropic security flap, Apple's CEO minefield, AI teaches listening.
• 🛠️ Three tools worth trying: Leonardo, Modal, Quillbot.
• 🗳️ Poll: Who wins the on-device voice race?

Let’s dive in. No floaties needed…

2026 Salary Report: U.S. vs Global hiring.

Want to know what world-class talent actually costs in 2026?

Athyna's Salary Report breaks down real salary data across AI, Tech, Data, Design, and more—so you can see exactly where the savings are.

The numbers might surprise you.

_{*This is sponsored content}

The Laboratory

TL;DR

• From cloud to closed environments: ElevenLabs, valued at $11B after a $500M Series D, is launching on-premises and on-device voice AI for customers such as government agencies and hospitals where data cannot leave the building, with early access expected in the first half of 2026.

• Purpose-built, not just compressed: The on-device models are designed from scratch for constrained hardware like entry-level GPUs and ARM chips, not simply shrunk versions of cloud models, though the company acknowledges they won’t fully match cloud capabilities.

• Regulation is forcing the timeline: The EU AI Act’s high-risk requirements take effect in August 2026, and U.S. frameworks like HIPAA and FedRAMP already make cloud-only deployment a non-starter for many enterprise buyers.

• The quality gap is the real risk: Voice synthesis is unforgiving because listeners notice even subtle degradation, and if on-device models sound noticeably worse, adoption in customer-facing scenarios could stall regardless of compliance benefits.

Inside ElevenLabs’ plan to unlock strictly controlled environments for AI deployment

For most of Artificial Intelligence’s short commercial history, voice synthesis has lived in the cloud: users send text to a remote server, which runs it through a model and returns audio. The architecture works for content creators dubbing YouTube videos or developers prototyping chatbots. However, the model hits a wall when the customer’s needs prioritize data security over ease of use. Customers like government agencies handling classified communications, a hospital system processing patient calls, or a defense contractor operating in an environment. In these use cases, data cannot leave the building, let alone the country.

ElevenLabs is building purpose-built models for two extremes: fully air-gapped on-prem systems and lightweight on-device AI that runs locally on constrained hardware. Photo Credit: ElevenLabs.

The solution to this problem is to figure out how to deploy AI in closed environments and devices that never connect to external servers or leave protected physical environments. Which is what companies like ElevenLabs, the voice AI company valued at $11B after a $500M Series D in February 2026, have been trying to do.

The company has been working steadily in this direction and recently announced that it will offer on-premise and on-device deployment of its voice models. Which means the models run on an organization’s own servers, inside its own data center, on confidential computing infrastructure (hardware that encrypts data even while it is being processed) with GPUs. On-device means the models run directly on the hardware itself: vehicles, wearables, embedded systems, anything that needs to generate speech without an internet connection. Both options are in early access, with initial releases expected in the first half of 2026.

This is in addition to the cloud and VPC (virtual private cloud) deployments the company already offers, in which models run in a customer’s own AWS SageMaker or GCP Vertex account. And with the new tiers, the company now offers a complete four-layer infrastructure spectrum designed to cover every enterprise environment, from fully connected to fully air-gapped.

For ElevenLabs, the recent announcement is not simply about rebranding its existing models to run on edge devices. Quite the contrary, it is to develop purpose-built models that users can deploy. This reflects the broader direction of the industry, which is working to expand its use in protected environments.

What ‘purpose-built’ means in practice

Take, for instance, ElevenLabs approach to solving the problem of data security. The company is carefully distinguishing between on-premises and on-device offerings.

According to ElevnLabs, purpose-built models were developed specifically for their target environments, not simply compressed versions of what runs in the cloud.

The on-premise path targets organizations like government agencies that cannot procure cloud infrastructure in their required region. It runs on Confidential Computing with GPUs and supports fully air-gapped deployments, meaning no external network connection at all.

The on-device path targets use cases requiring offline inference on constrained compute: automotive manufacturers embedding voice into vehicles, or hardware companies building voice into wearables. Those models are optimized for entry-level GPUs, NPUs (neural processing units, specialized chips designed for AI workloads), and modern CPU and ARM-based chips.

The company acknowledges that these models do not fully mirror its cloud portfolio in terms of capabilities, but says they reflect its highest quality standards. That trade-off between capability and control is inherent to on-device AI: running models locally means working within tighter memory, compute, and power constraints than those of a cloud data center.

Why now

For ElevenLabs and the wider industry, the time is ripe to make the transition to on-device AI, as two forces converge, making the timing deliberate rather than incidental.

The first is regulatory, with the EU AI Act’s requirements for high-risk AI systems taking effect in August 2026. The act imposes obligations around data governance, risk management, technical documentation, human oversight, and conformity assessments. For voice AI used in healthcare triage, financial services, or government operations, those requirements are substantial. When a model runs entirely within the deployer’s infrastructure, the chain of data custody is shorter, the audit trail is simpler, and the deployer has direct control over the data governance environment that regulators will scrutinize.

Around the same time, in the United States, HIPAA’s protections for patient health information, the Gramm-Leach-Bliley Act’s financial data requirements, and federal security frameworks like FedRAMP create constraints that on-premises deployment directly addresses.

The second factor that makes the timing apt is that on-device AI inference has progressed from a research curiosity to an engineering discipline as the market has matured. The core arguments for local deployment, as ElevenLabs frames it: avoiding network round-trips in real-time systems where milliseconds matter, keeping data inside a controlled environment, and enabling offline operation in locations where cloud infrastructure is unavailable or prohibited.

All this is possible because model compression, quantization, and purpose-built architectures have matured to the point that smaller models optimized for constrained hardware can now handle practical tasks that previously required full cloud compute.

As MIT Technology Review has noted, shipping data to the cloud creates latency that undermines time-critical applications. At the same time, on-device processing delivers responsiveness, privacy, and cost savings by ensuring sensitive data never leaves the hardware.

Beyond these, a closer look at ElevenLabs highlights a shift that is forcing much of the AI industry to rethink its strategy: to onboard as many enterprise clients as possible, and as quickly as possible.

ElevenLabs is currently in the middle of an aggressive enterprise push. It partnered with Deutsche Telekom to deploy voice agents across Europe’s largest telecom, integrated with IBM WatsonX Orchestrate to bring TTS and STT to enterprise-agentic workflows. It signed its first Big Four consulting deal with Deloitte to deploy conversational agents at scale.

The company has also expanded its Google Cloud partnership with access to NVIDIA Blackwell GPUs. On-premise and on-device are the next logical steps in making the platform available to the enterprise customers these partnerships are designed to reach: those whose compliance requirements make cloud adoption a non-starter.

The competitive landscape

The competitive landscape around ElevenLabs is increasingly defined not by who has the best voice model, but by who can deploy it where the cloud cannot go. Across the industry, there is a growing convergence on a simple reality: in regulated environments, deployment architecture is as critical as model quality.

Some players approach this from the infrastructure side. Rasa has built its positioning around on-premise and private deployments, offering enterprises control and modularity, but often leaving them to assemble and optimize the full stack themselves. Others, like Avaya, are extending existing enterprise systems with AI capabilities, prioritizing reliability and compliance over cutting-edge voice quality.

What this creates is a clear divide. AI-native companies are moving toward enterprise-grade deployment, while enterprise incumbents are layering in AI. The advantage will lie with those who can bridge both—delivering high-quality voice without compromising on control or compliance.

What could complicate the picture?

However, even as the industry continues to mature to ensure continued patronage from enterprise clients, several questions remain open.

The first and biggest question is how much enterprises are willing to sacrifice by moving off the cloud. Voice synthesis is an unforgiving domain: listeners detect subtle quality degradation in speech that they might tolerate in text generation. If the on-device models sound noticeably less expressive or natural than their cloud counterparts, adoption in customer-facing scenarios, where voice quality is the entire value proposition, could be limited.

Then there is the question of accountability. When voice cloning technology runs entirely behind a customer’s firewall, with no data reaching developers, it compromises their ability to monitor for misuse, enforce responsible-use policies, and detect deepfakes effectively.

Finally, there is the question of execution. On-premise and on-device are new product categories for the industry, and though companies have built confidence in their cloud-native AI delivered through APIs. Shipping models that run on customer-owned hardware, in environments the companies cannot directly access or debug, with controlled update cycles, and custom voice development, is a fundamentally different operational challenge.

What comes next

Looking through the lens of ElevenLabs, the company’s CEO, Mati Staniszewski, has publicly stated that an IPO is planned for two to three years. On-premise and on-device enterprise contracts, with their longer durations and higher average values, could improve revenue predictability, which public markets demand. But they also introduce a different kind of business: longer sales cycles, heavier implementation support, and the operational complexity of maintaining models across a fragmented set of deployment environments.

The broader question, which extends beyond ElevenLabs, is how companies operating in this space will manage the shift from cloud APIs to a triad of quality, compliance, and deployment flexibility.

What, however, is clear is that the companies that can deliver all three without forcing customers to choose between them will define the next phase of enterprise voice AI.

Headlines You Actually Need

• Anthropic Mythos: something about unauthorized users accessing security, sounds like a Claude/Anthropic security incident. Need a one-line confirmation of what actually happened.
• Apple's John Ternus minefield: the job challenges facing Apple's incoming CEO. This one's clear from the slug.
• AI teaching us to listen: BBC Future on how AI can improve human listening skills. Also clear.

The context to prepare for tomorrow, today.

Memorandum merges global headlines, expert commentary, and startup innovations into a single, time-saving digest built for forward-thinking professionals.

Rather than sifting through an endless feed, you get curated content that captures the pulse of the tech world—from Silicon Valley to emerging international hubs. Track upcoming trends, significant funding rounds, and high-level shifts across key sectors, all in one place.

Keep your finger on tomorrow’s possibilities with Memorandum’s concise, impactful coverage.

_{*This is sponsored content}

Friday Poll

🗳️ ElevenLabs is going on-device. What's the real winner?

Meme Of The Day

— (@)

Weekend To-Do

• Run a local LLM: Install LM Studio and run an open-source model on your laptop, no cloud, no API key, just to feel the on-device shift firsthand.
• Try on-device voice: Test Kyutai's Moshi or Whisper.cpp to see how close offline voice AI is getting to cloud quality.
• Audit your AI stack: Map every tool your team uses and flag which ones send sensitive data to a third-party cloud, the kind of exercise enterprise buyers are doing right now.

The Toolkit

• Leonardo AI: AI image and video generator with fine-grained creative controls, built for designers, marketers, and game studios who need consistent style at scale.
• Modal: Serverless cloud for running Python and AI workloads, lets you spin up GPUs in seconds without touching infrastructure.
• Quillbot: AI writing assistant that paraphrases, summarizes, and rewrites text on demand, useful for tightening drafts or escaping your own voice.