Roko's Basilisk
Posts
The Parameter Myth

The Parameter Myth

Plus: Slate rethinks its EV battery, Meta relaunches its creator app, Google's search dominance cracks.

Roko's Basilisk
June 29, 2026

Here’s what’s on our plate today:

🧪 Why parameter count is the wrong way to judge AI models.
📰 Slate rethinks its EV battery, Meta relaunches its creator app, and Google's search dominance cracks.
💡 Roko's Pro Tip: Stop buying AI by parameter count, buy by fit.
🗳️ Poll: What actually makes one AI model beat another?

Let’s dive in. No floaties needed…

In partnership with

Investors see ANOTHER return from Masterworks (!!!!)

That’s 6 sales in 7 months. 29 all time. And the performance?

16.5%, 17.6%, and 17.8%, net annualized returns on sold works held longer than one year (See all 29 at Masterworks.com)

It’s not from stocks, private equity, or real estate… it’s from contemporary and post war art. Crazy, right?

With Masterworks, you don’t need to be a BILLIONAIRE to invest in multi-million dollar art anymore.

Historically, the segment overall has had attractive appreciation and low correlation to stocks.*

Masterworks targets works featuring legends like Banksy, Basquiat, and Picasso, identifying what they believe to have significant long-term appreciation potential, not just at the artist level but at the level of individual artworks.

As one of the largest players in the art market, with $1.3 billion invested over 500 artworks, they pass critical advantages through to their 70,000+ members to add art to their portfolios strategically.

Looking to diversify your investments in 2026?

Click here to skip the waitlist

^{*According to Masterworks data. Investing involves risk. Past performance is not indicative of future returns. See important Reg A disclosures at}^{masterworks.com/cd}^.

_{*This is sponsored content}

The Laboratory

TL;DR

Bigger stopped meaning better: DeepMind's 2022 Chinchilla research found most large language models were undertrained, not underbuilt. A 70B-parameter model trained on enough data beat GPT-3's 175B and Gopher's 280B.
Sparse beats dense: Mixture of Experts architectures now power frontier models by activating only a fraction of their parameters per query. DeepSeek showed that a 16B MoE model matched a 7B dense model with 40% of the compute.
Small models are closing the gap, but not everywhere: Mistral's 7B model matched the reasoning performance of a Llama 2 model three times its size. Yet a June 2026 Nature Medicine study found general-purpose frontier models beating specialized clinical AI tools, proof that domain-specific training isn't always the safer bet.
The real signal enterprises miss: Parameter count remains the easiest number to market and the worst one to buy on. Training data quality, inference efficiency, and task fit matter more, and the gap between perception and reality is getting expensive.

Why parameter count is the wrong way to judge AI models

A common misconception among car enthusiasts is that more horsepower automatically means better performance. In reality, horsepower only matters in context. A heavy pickup truck needs far more power to accelerate quickly than a compact hatchback, while a lightweight sports car can deliver impressive performance with a much smaller engine. The number itself means little unless you consider the vehicle it is moving.

Like horsepower in a car, parameter count is only one measure of performance in AI models, and often not the most important one. Photo Credit: Getty Images.

Artificial intelligence has long suffered from a similar misunderstanding. In place of horsepower, AI models are measured in parameters, the numerical values that a model adjusts during training to learn patterns and make predictions. As models grew larger over the past decade, parameter counts became an easy shorthand for capability, creating the impression that more parameters inevitably yield better models.

That assumption became deeply embedded in both research and industry. Companies routinely compared models by their parameter counts, and enterprises often treated larger models as inherently more capable and therefore more valuable. Yet for most people working with AI, parameters remain an abstract concept, represented by numbers so large that they offer little intuitive sense of what a model can actually do.

The reality, however, is more complicated. Just as horsepower only becomes meaningful when considered alongside a vehicle's size and purpose, parameter counts tell only part of the story about an AI system. To understand why some models outperform much larger rivals and why smaller models can sometimes be the better choice, it helps to start with what a parameter actually is and how the industry's assumptions about scale have begun to change.

What are parameters?

At their core, parameters are the dials and levers that control how a large language model behaves. Think of them as settings on an enormously complex machine, each one adjusted through training to produce useful outputs. Engineers don't program these numbers directly; they emerge from exposure to vast quantities of training data.

Parameters fall into three broad categories: embeddings, weights, and biases. Embeddings turn words into numbers that a computer can work with. When a model encounters the word ‘king’, it isn't reading letters. It's recognizing a long list of numbers that place ‘king’ close to related concepts like ‘queen’ and far from unrelated ones like ‘refrigerator’.

Weights determine how much each piece of information matters as it moves through the model. Weights set the thresholds at which different parts of a model fire and pass data to the next part, and they're adjusted continuously during training so the model improves over time.

Biases function more like sensitivity controls. They adjust those thresholds, allowing a signal to trigger activity even when its value is low, similar to a knob that helps a listening device pick out quiet voices in a noisy room.

Modern AI systems run on billions, and in some cases, well over a trillion, of these parameters. GPT-4's exact parameter count has never been confirmed by OpenAI; when asked about a widely shared chart comparing GPT-3's 175B parameters to a rumored 100T for GPT-4, CEO Sam Altman called it “complete bullshit”. Regardless of the true numbers, training a model at that scale means fine-tuning each of these numbers repeatedly, at enormous computational cost.

The architecture that changed everything

The breakthrough that made modern AI possible was the transformer architecture, introduced in 2017. Transformers allow models to weigh the relevance of each word in a passage against every other word simultaneously, rather than reading one word at a time. In the sentence "The cat sat on the mat because it was tired," this mechanism helps the model infer that "it" refers to the cat, not the mat, by calculating how closely related the words are.

Before transformers, models processed text sequentially, one word at a time, making it harder to connect words that were far apart in a sentence or paragraph.

The transformer changed AI by allowing models to understand relationships between words simultaneously rather than one at a time. Photo Credit: NVIDIA.

Transformers rely on a technique called attention. Modern systems use a version known as multi-head attention, which performs several attention calculations in parallel, allowing the model to track different types of relationships between words simultaneously.

The result is a system that understands context far better than earlier architectures could, but one that also requires a very large number of parameters. It also changed how the field would come to measure a model's ability.

When bigger stopped being better

Because larger transformer models often outperformed smaller ones, the AI industry spent years operating under a simple assumption: more parameters, more training data, and more computing power would reliably yield better results. Researchers at OpenAI, Google, and other labs competed to build ever-larger models, with each new release setting a parameter record.

That assumption began to break down in 2022, when researchers at DeepMind found that many large language models were simply too large for the data they had been trained on. The industry had been increasing parameter counts faster than it was increasing training data.

The Chinchilla paper showed that model size and training data need to grow together. For optimal results, each doubling of model size should be matched by a doubling of the training data, which amounts to roughly 20 training tokens (pieces of text) per parameter.

The implications were immediate. DeepMind found that, for the same computing budget used to train Gopher, a model four times smaller trained on four times more data would have performed better. The resulting model, Chinchilla, had 70B parameters but outperformed Gopher's 280B, GPT-3's 175B, and several other larger contemporaries. The lesson was clear: parameter count alone does not determine capability. How effectively those parameters are trained matters just as much, and in some cases more.

The sparse revolution

Since then, developments in model architecture have further complicated the relationship between parameters and performance.

Mixture-of-Experts (MoE) architectures introduced a new paradigm: not all parameters are active for every query. Rather than routing every input through the entire model, MoE sends different inputs to specialized subnetworks suited to the task at hand. A model might contain hundreds of billions of parameters and activate only a fraction of them on any given query.

DeepSeek's MoE research is a useful illustration of the trade-off: a 16B-parameter MoE model matched the performance of a dense 7B-parameter model while using only 40% of the computation. Total parameter count becomes a much less meaningful number when most of those parameters sit idle during inference.

NVIDIA has reported that mixture-of-experts architectures now power the most capable frontier models, including several top performers on independent leaderboards, and that these models run significantly faster on newer hardware optimized for sparse activations. The shift reflects a broader change in how researchers think about model design: parameters still matter, but using them efficiently has become just as important as having more of them.

Small models, surprising results

Smaller models trained with care have produced some of the most striking results in the field. Mistral AI's own benchmarking found that its 7B-parameter model performed on reasoning and comprehension tasks at a level equivalent to a Llama 2 model more than three times its size. The gap that once seemed insurmountable between small and large models has narrowed considerably.

The advantage of focused, domain-specific training isn't universal, though. A June 2026 Nature Medicine study found that general-purpose frontier models outperformed specialized clinical AI tools across medical knowledge, clinician alignment, and real-world clinical queries, complicating the idea that domain-specific training is always the safer bet.

What actually matters

Just as horsepower only means something alongside a vehicle's weight, parameter count only means something alongside training data quality, architectural efficiency, and intended use case. A 7B-parameter model can outperform a 70B-parameter model on a specific task if it was trained on better data or built on a more efficient architecture.

For enterprises evaluating AI solutions, the practical implication is straightforward: parameter counts offer a rough signal of capability, but they shouldn't be the primary criterion for selection. The more useful questions are what data a model was trained on, how efficient it is at inference, and whether it performs well on the tasks that actually matter to the buyer.

The trajectory of the field points toward raw scale mattering less over time, not more. Researchers are increasingly finding ways to do more with fewer parameters rather than simply adding more. The next round of meaningful breakthroughs is more likely to come from models that use their existing parameters more intelligently than from models that chase the next order-of-magnitude increase in parameter count.

The lesson here extends beyond AI. In any complex system, a single metric rarely tells the whole story. Horsepower matters, but so does weight. Parameters matter, but so does how they're trained and deployed. The number on the spec sheet is only part of the story; understanding the rest means looking past it to find the right tool for the problem actually being solved.

Roko Pro Tip

💡

Stop shopping for AI by parameter count. It's the horsepower number on a spec sheet, useless without knowing the weight it's moving. Ask what a model was trained on, how efficient it is at inference, and whether it's actually good at your task, then pick the smallest model that clears the bar.

Build your site on Framer, now with Agents

Framer is a pro website builder trusted by companies like Miro and Perplexity that helps creators, teams, and businesses ship production-ready sites faster than ever.

With AI agents built directly into the canvas, teams can design pages, manage CMS content, write copy, add SEO, and audit for issues — all without leaving the tool where the real site lives.