Roko's Basilisk
Posts
China’s AI Underdog Play

China’s AI Underdog Play

Plus: The future of frugal AI, Apple’s chip lock-in, Nissan’s ProPilot push, and more.

Roko's Basilisk
September 23, 2025

Here’s what’s on our plate today:

🧪 How DeepSeek trained a GPT-level model for just $294k.
📰 Apple’s AI chip plan, Nissan’s upgrade, and robotaxi battlegrounds
🤖 The next big unlock in AI might not be power—it’s price.
📊 Tuesday Poll: Should we be chasing AGI or optimizing what we have?

Let’s dive in. No floaties needed…

Presented by

It’s go-time for holiday campaigns

Roku Ads Manager makes it easy to extend your Q4 campaign to performance CTV.

You can:

Easily launch self-serve CTV ads
Repurpose your social content for TV
Drive purchases directly on-screen with shoppable ads
A/B test to discover your most effective offers

The holidays only come once a year. Get started now with a $500 ad credit when you spend your first $500 today with code: ROKUADS500. Terms apply.

Get started today.

_{*This is sponsored content}

The Laboratory

How DeepSeek trained an AI model for pennies on the dollar

New technologies usually start out expensive and exclusive. Over time, costs fall as adoption grows and businesses step in to build infrastructure. Automobiles, personal computers, and cameras all followed this path.

However, in the case of artificial intelligence, the story may be fast-tracked. While the common perception is that Large Language Models are expensive to build, train, and run, especially with companies like OpenAI, Google, and Meta spending billions of dollars to build infrastructure, hire researchers, and advance computing power, a startup based in China has, on more than one occasion, grabbed headlines for developing training models at a fraction of the cost of its American counterparts.

DeepSeek recently shared that it spent just $294,000 on training its latest model. Reigniting the debate around the different approach it took to train its model, and whether others should apply similar methods to bring down the cost of developing AI models.

DeepSeek’s cost canyon

In September 2025, Reuters reported that Chinese AI developer DeepSeek said it spent $294,000 on training its R1 model. The cost of training the model was reported in a peer-reviewed paper co-authored by DeepSeek’s founder, Liang Wenfeng, and published in Nature magazine.

The article shared that DeepSeek's reasoning-focused R1 model cost $294,000 to train, and it used 512 Nvidia H800 chips; an earlier version of the article published in January did not contain this information. This is in stark contrast to the amounts spent by OpenAI, which is presently one of the leading AI companies in the U.S.

According to OpenAI’s CEO, Sam Altman, the total cost of training a foundational model can exceed 100 million. While OpenAI itself does not release the exact figures, the very notion that foundational models can be trained for much less than what companies in the U.S. are spending can have major implications for future AI development and product pricing.

But before that happens, there is a need to take a closer look at how DeepSeek managed to train and update its AI models at a fraction of the cost spent by others.

The four steps of AI training

Faced with U.S. export restrictions and limited access to cutting-edge chips, DeepSeek had to reimagine each step of this process.

Training a LLM involves four main steps: first, creators collect and clean massive amounts of text, code, and other data to ensure quality; next, the model is pretrained to learn patterns by predicting or filling in text, a process that demands tremendous computing power; then it is fine-tuned for specific tasks with techniques like human feedback, where people rate or correct outputs to guide behavior; finally, the model is evaluated on new data, tested for generalization, and deployed for real-world use, with ongoing updates and fixes as it encounters new situations. This is the approach. DeepSeek, though, was constrained by hardware restrictions and had to find workarounds.

DeepSeek used 512 H800 chips, which are less powerful than the chips used by other companies but more readily available and cheaper in China. By building training systems around these chips, they avoided the massive expense of high-end GPUs.

The Chinese startup also lowered costs by reducing the hours spent on training the model. DeepSeek-R1 was trained for just 80 hours. By contrast, U.S. companies like OpenAI or Anthropic often run training jobs for weeks or even months.

Reducing training time without crippling performance means they spent much less on energy and hardware usage.

DeepSeek also optimized the model during training to utilize fewer resources. This likely means they used techniques such as activating only parts of the model during training (instead of the whole thing at once), pruning unnecessary parameters, or using data more effectively so the model learns faster with less compute.

According to a Reuters report, DeepSeek also used distillation, a technique that involves an AI system learning from another AI system. The technique allows newer models to reap the benefit of the investment of time and computing power that went into building the earlier model, but without the associated costs. DeepSeek has, in the past, acknowledged that it used Meta's open-source Llama AI model for some distilled versions of its own models for this process.

So, while DeepSeek may have trained its models at a fraction of the cost of its American counterparts, the startup had to rely on earlier work to bring down the costs.

Another interesting thing to note is that while DeepSeek’s model may be able to deliver performance at par with that from American companies, their end goals differ.

Cheaper chips, smarter training

The implications of DeepSeek achieving performance through optimization are deep-seated. Since the startup released its first model, the industry outlook has shifted. Earlier, the only approach to training models was to incur the exorbitant cost of developing state-of-the-art AI models. This created barriers for researchers and companies in the Global South, where access to capital and computing is often limited.

DeepSeek was the first major AI model from Asia that parallels the performance of models coming out of the U.S. and European nations.

However, it is important to note that DeepSeek’s models chase optimization, not raw performance, which is what companies in the U.S. chase. OpenAI, Meta, and others have talked about and invested large amounts in the chase of artificial general intelligence. Their outlook is that after the development of AGI, the problems of computing costs and the impact on the planet would be solved by the AI.

On the other hand, DeepSeek’s approach optimizes AI models for deployment rather than raw power. The startup also made its model open-source, which aids in democratizing the use of AI by allowing companies with limited resources to emulate and utilize its work.

So, while the companies in the U.S. and Europe are busy pushing the boundaries of AI models’ capabilities, those in Asia are looking to deploy the models with better optimization, rather than building the biggest one yet.

The global ripple effects

The recent article in Nature magazine is not the first time DeepSeek has grabbed headlines. In January 2025, the startup shook the tech world when it released a paper stating that the training of its V3 model required less than $6 million worth of computing power from Nvidia H800 chips. When the company released its chatbot based on the model, interest piqued, and it overtook rival ChatGPT to become the top-rated free application available on Apple's App Store in the United States.

Even within tech circles, the DeepSeek’s models were praised for having performance on par with models from OpenAI and Meta, both of which spent a lot more money training their models.

While DeepSeek’s success was raving in praise, some suggested it was not possible and that DeepSeek had access to far more capable chips than it claimed.

Scale AI’s CEO, Alexandr Wang, was one such naysayer. He speculated that DeepSeek had 50,000 Nvidia H100 chips, which he claimed the startup was hiding as it violated export restrictions that banned the sale of powerful AI chips to China.

The impact of DeepSeek, however, went beyond discussions. Its models, and the corresponding research paper, shook the stock market, with investors dumping tech stocks as they worried that the emergence of a low-cost Chinese model would threaten the dominance of AI leaders. The most hit was Nvidia. The chipmaker lost $593 billion of its market value in just one day, as the sentiment shifted to the key role its products would play in the development of future models.

Lessons from Henry Ford

DeepSeek’s latest revelations are an important step in the development and deployment of AI models. The company’s approach to optimizing the system used by Western tech companies is an important step in securing the future scalability of AI.

The car was not invented by Henry Ford, but his innovations and optimizations made motorized vehicles a possibility for millions of people. Some of his ideas around optimizing workflows are still used to this day.

Similarly, DeepSeek’s approach to training foundational models may not deliver the same results; they may also fail at competing with future models from industry leaders like OpenAI, Google, and Meta. However, the chase for optimization may provide the blueprints for future startups that want to scale the use of AI, rather than continue to chase the ever-shifting target of AGI.

Roko Pro Tip

🚨 Bigger isn’t always better.

As DeepSeek proved, smart training beats brute force. If you’re building or fine-tuning models, explore distillation, sparse activation, and shorter training cycles—especially if compute costs are a bottleneck.

Lead confidently with Memorandum’s cutting-edge insights.

Memorandum distills the day’s most pressing tech stories into one concise, easy-to-digest bulletin, empowering you to make swift, informed decisions in a rapidly shifting landscape.

Whether it’s AI breakthroughs, new startup funding, or broader market disruptions, Memorandum gathers the crucial details you need. Stay current, save time, and enjoy expert insights delivered straight to your inbox.

Streamline your daily routine with the knowledge that helps you maintain a competitive edge.