Roko's Basilisk
Posts
On-Device AI, Explained

On-Device AI, Explained

Plus: OpenAI’s big board move, Apple’s photo magic, and a romantic chatbot.

Roko's Basilisk
September 15, 2025

Here’s what’s on our plate today:

🧠 On-device AI is changing everything from smartphones to PCs.
🍿 Apple’s camera sleight of hand, OpenAI’s governance twist, and chatbot love.
💡 Roko’s Pro Tip — What you need to know before trusting offline AI tools.
📊 Monday Poll — Would you buy a device just for its on-device AI?

Let’s dive in. No floaties needed…

Stay at the forefront with daily Memorandum tech insights.

Memorandum distills the day’s most pressing tech stories into one concise, easy-to-digest bulletin, empowering you to make swift, informed decisions in a rapidly shifting landscape.

Whether it’s AI breakthroughs, new startup funding, or broader market disruptions, Memorandum gathers the crucial details you need. Stay current, save time, and enjoy expert insights delivered straight to your inbox.

Streamline your daily routine with the knowledge that helps you maintain a competitive edge.

_{*This is sponsored content}

The Laboratory

How on-device AI is reshaping the deployment of models

When Carl Benz patented the first automobile in 1886, the true impact of the invention would have been hard to guess. Even the most progressive outlook would have failed to imagine a world where automobiles would be the focal point of communities and drive commerce to new heights. Even today, it is difficult to assess what the future of mobility would look like. Will it be EVs, or will private vehicle ownership make way for public systems? It isn't easy to predict.

If we consider current generative artificial intelligence models, they are the equivalents of the early days of the automotive industry. The technology has the potential to transform multiple aspects of society, but assessing which direction it will take is more guesswork than a projection.

Today’s large language models run on dedicated data centers, which continuously exchange information with devices like laptops and smartphones. Built on the back of data centers, this model has allowed companies like OpenAI, Meta, and Google to build models and train them on trillions of parameters.

However, while this model is great for building more powerful AI models, in a fractured world where connecting to the cloud can be unreliable, a different approach is needed to bring AI tools to a large consumer base.

What is on-device AI

On-device AI means exactly what the name suggests. Instead of sending data to a giant cloud server for every request, the AI model runs directly on devices like smartphones, laptops, cars, consoles, cameras, or factory gateways.

The approach does not involve large data centers and makes use of neural networks that, while working like a gen AI model, are far smaller and can run on individual devices. The approach prioritizes on-device computing, and it isn’t a gimmick. It’s a structural shift from the model that requires tremendous amounts of resources to train and run AI models.

In practice, most modern systems are hybrid: do as much as possible on the device for speed and privacy, then hand off only the heavy jobs to a server.

One of the most promising uses of on-device AI processing can be seen in the new class of AI PCs, which pack dedicated neural processing units (NPU) so they can run chatbots, image tools, and other tools locally.

According to Reuters, Research firm Canalys estimates that AI PCs, capable of processing data more swiftly than traditional PCs, will surpass 100 million in 2025, constituting 40% of all PCs shipped. These PCs are being marketed by brands like Dell, Samsung, Lenovo, Asus, and Acer. The report notes that most of these computers come under Microsoft's Copilot+ branding. Highlighting the rising use of on-device AI processing, and that it is no longer a niche side show but a mainstream architectural change.

It isn’t just PCs; on-device AI is allowing device makers and vendors to bypass the cloud for tasks and leverage AI to power features on smartphones and wearables.

Where on-device AI truly shines

The ability of on-device AI to be smaller makes it ideal for use on wearables and smartphones. Apple’s ‘Apple Intelligence’ is a clean illustration of the new split‑brain architecture. It allows AI providers to leverage neural networks to power features that may otherwise require access to a cloud.

In this split architecture, a small model runs on the device like the iPhone, iPad, or Mac, and heavier requests fall back to Private Cloud Compute. The way Apple has approached this allows the company to claim not just strong performance but also strong privacy while still handling tasks too big for a phone‑sized model.

This splitting of tasks between on-device and cloud processing not only solves the problem of latency, where AI requests take time to process, but also allows OEMs to provide features in areas with unreliable network connections. The system works so well that companies like Google are using it with Gemini Nano for offline use cases like summarization and smart replies to cut costs and harden privacy.

However, while it is ideal for certain use cases, on-device AI is yet to reach its full potential. It is not a magic bullet that can completely transform data centers and bring down the costs of deploying AI models.

The challenges of on-device AI

Even the best NPUs and mobile GPUs can’t match a datacenter full of accelerators. And though vendors like Google and Apple are smoothing the gap with smarter runtimes and aggressive quantization, challenges remain.

Devices like smartphones, earbuds, and smartwatches were not designed with AI in mind. Their small physical footprint makes it difficult to include powerful NPUs. Additionally, constrained memory (RAM), limited storage, and lower precision in how fast they compute are significant roadblocks in their deployment.

Large models (hundreds of billions of parameters) simply cannot run efficiently on such hardware without severe compromises in performance or accuracy.

Then there is the question of energy. Running inference (or worse, training or fine-tuning) locally uses energy. This drains batteries faster. Also, heat and thermal throttling pose performance issues.

Edge devices need to balance performance with power usage so users aren’t sacrificing battery life for AI features. The inclusion of a vapor chamber in the iPhone 17 series is a testament to Apple’s efforts to overcome barriers to running as many AI processes on-device as possible.

And finally, to fit on-device, models are usually compressed through quantization, pruning, and distillation. But that often comes with some loss of accuracy or expressiveness. For example, smaller or quantized models might struggle with long contexts, nuanced reasoning, or rare cases. It’s a tough balancing act. Another problem is updating the existing models. While Cloud models can be updated frequently and centrally, on-device models require pushing updates to many devices, ensuring compatibility, and avoiding bugs.

Even if AI models can be trimmed down to be run on edge devices, there is the question of cost. Companies cannot rely solely on AI to market their products. Edge devices have to provide a holistic experience for users at a price that enables mass adoption. Without it, even the most advanced AI model running on an edge device will have a hard time making it beyond the testing phase.

The economics of it all

Regardless of whether LLMs are run in data centers in the cloud or on edge devices like laptops or smartphones, the economic incentives will play an important role in AI evolution.

Cloud providers charge users per use, and so far, companies have been investing massive amounts to build up their data center capabilities to meet the growing demand. This is fueled by the current hype around the use of AI to automate tasks and develop features. However, how many of them solve genuine real-world problems remains to be seen.

In the meantime, data centers continue to add to the existing challenges of managing the environmental impact consequences of human activity. As such, the economics of building data centers rest on future profits, which are yet to be realized.

By contrast, on-device AI shifts much of the cost upfront, turning it into a capital expense users pay when buying devices.

AMD has rolled out AI‑centric laptop and desktop chips for the business market, joining Apple/Qualcomm/Intel in the AI PC push. Microsoft has even pushed Copilot+ into cheaper SKUs to broaden access. This shift not only makes AI more accessible but also increases its appeal for individual users.

However, this approach too has its limitations. At the end, the model that solves the most problems and creates real value for consumers is the one that will dominate.

The ever-evolving face of technology

The first automobile took decades to reach the shape and form that most of us grew up with. Even now, the world of automobiles is undergoing rapid changes to address growing concerns of climate change.

If the constant shifts in the automotive industry are any indication, AI has only just begun to show its potential. The shape it ultimately takes—whether through cloud-based processing or on-device solutions—will largely depend on user preferences and economic factors.

Roko Pro Tip

💡 Test on-device AI before you buy.

Many tools now offer offline or hybrid modes — try apps like Gemini Nano (on Pixel) or Apple Intelligence previews. You’ll see where local beats cloud — and where it still lags.

Your next AI expert is just a click away.

AI is evolving fast—don’t fall behind. AI Devvvs connects you with top AI and ML professionals who bring real expertise to your projects.

From data science to robotics, we provide handpicked, vetted talent ready to deliver results. No more lengthy hiring processes or mismatched hires—just skilled professionals who integrate seamlessly into your team. Solve your toughest AI challenges with ease.