New Fault Lines In Multimodal AI

Plus: OpenAI crisis alerts, Qwen exit & iPhone 17e upgrade.

Here’s what’s on our plate today:

  • 🧪 MiniMax’s multimodal gambit and the new AI battle lines.

  • 📰 Quick Bits: OpenAI crisis alerts, Qwen lead exit, iPhone 17e upgrades.

  • 🧰 Tools: Regie, Replit, and Sourcegraph for sales, coding, and search.

  • 📊 Poll: Should multimodal AI be open, closed, or hybrid?

Let’s dive in. No floaties needed…

Goodies delivered straight into your inbox.

Get the chance to peek inside founders and leaders’ brains and see how they think about going from zero to 1 and beyond.

Join thousands of weekly readers at Google, OpenAI, Stripe, TikTok, Sequoia, and more.

Check all the tools and more here, and outperform the competition.

*This is sponsored content

The Laboratory

MiniMax, multimodality, and the new fault lines of AI

TL;DR

  • MiniMax bet early that real machine intelligence must be multimodal, not text only.

  • Its IPO and tech stack show investors now see images, audio, and video as the next AI frontier.

  • The company runs a portfolio of specialized models tied together by a media agent for end-to-end workflows.

  • It sits in the crossfire of model distillation accusations, export controls, and US–China AI competition.

Yan Junjie founded MiniMax in 2021 on the belief that AI must be multimodal, a vision that now looks strikingly ahead of its time. Photo Credit: Bloomberg.

In a world where daily news stories cover the different aspects of artificial intelligence, one would assume the meaning of intelligence would be clear and well understood. However, if one were to ask the average reader of these news stories what intelligence means, they are unlikely to receive a cohesive or concrete definition.

This lack of clarity does not stem from readers being unaware or uninformed; rather, it stems from the fact that the very concept of intelligence has not yet been fully explored. For generations, philosophers have sought a concrete definition of intelligence, but they have yet to agree on one for humans or machines. This does not mean that intelligence is some abstract idea. It just means that, while there may be no clear definition of intelligence, there are serious frameworks that converge on a few core ideas about what intelligence means.

One core idea is that intelligence is the capacity of a system to acquire information, build internal representations of the world, reason about those representations, and flexibly use them to achieve goals across changing environments. In humans, the idea presents itself as a combination of abilities rather than a single trait. However, when discussing artificial intelligence, it typically means a computer system can perceive its environment and take actions that maximize its chances of achieving specified objectives. The contemporary idea of artificial intelligence satisfies these requirements, but they have a fatal shortcoming. Most contemporary AI systems rely solely on text as input, decode information, and achieve specified goals.

However, in the physical world, intelligence cannot reside in a computer system that can only interact with its surroundings via text.

This thought, or rather, the understanding that true machine intelligence cannot be achieved solely through text, led Yan Junjie to walk away from one of the most coveted positions in Chinese tech.

In December 2021, Junjie left SenseTime to start MiniMax, to build an AI system that would synthesize images, sound, video, and language together as basic abilities, not optional add-ons to achieve intelligence. At the time, his ideas were unpopular; however, today, his bet looks less like a gamble and more like a prophecy, and MiniMax is quickly becoming one of the most talked-about AI companies in the world.

From contrarian idea to market sensation

For many, MiniMax first caught their attention when it was listed on the Hong Kong Stock Exchange in January 2026. MiniMax’s shares opened at HK$235.40, about 43% above the IPO price of HK$165. It later climbed more than 113% intraday before closing at HK$345, leaving the company valued at roughly US$ 13.7B.

For context, OpenAI and Anthropic are still preparing for their potential public listings. Amid this backdrop, MiniMax joined the ranks of Chinese companies moving toward the stock market, backed by established names including Alibaba, Tencent, the Abu Dhabi Investment Authority, Mirae Asset Securities, and Boyu Capital.

Why multimodality is becoming inevitable

MiniMax’s successful IPO stands out in the AI investment landscape because it reflects the industry’s shifting focus, away from text and toward multimodal capabilities. Driving this shift in focus is the understanding that although the digital economy today revolves around images, videos, and audio across sectors from social media to healthcare and manufacturing, most leading AI models remain text-centric systems with vision and other modalities added only as secondary capabilities.

And as MiniMax reaps the benefits of being the proverbial ‘early bird,’ others in the AI landscape are now acknowledging the importance of multimodality. OpenAI’s developers noted in their 2025 year-end report that multimodal AI had passed an important turning point, evolving from simply accepting images to powering end-to-end workflows that combine text, visuals, and audio within a single product experience.

The realization that the next phase of AI development will be on the multimodal front is driven not just by the race to achieve machine intelligence, but also by the technical realities of developing AI models.

Early AI models were trained on vast pools of text data; however, there is now a dearth of high-quality data, which in turn has slowed the rate of improvement. In this context, image, video, and audio have become the next major sources of learning and performance growth, creating a constraint for companies without access to large, high-quality multimodal datasets.

The shift, where companies like MiniMax have a head start, is now forcing major AI companies to rapidly change their strategies, and industry forecasts indicate how quickly this transition may unfold. According to Gartner, 80% of enterprise software will be multimodal by 2030, a dramatic rise from less than 10% in 2024, suggesting not a gradual evolution but a fundamental change in how software is designed and built.

MiniMax stands poised to take the lead in this era of AI development, as it does not rely on a single model; instead, it relies on a portfolio of models and the infrastructure that connects them, setting it apart in the AI race.

MiniMax’s technical strategy

MiniMax runs a suite of specialized models across text, vision, video, speech, and music, including its M2.5 reasoning model, which posts competitive results on software engineering benchmarks. These systems are unified through the company’s Media Agent, which lets users describe a task in plain language and automatically generates images, voice, music, and video within a single workflow, effectively turning conversation into a production interface.

Behind the interface, MiniMax uses a ‘Mixture of Experts’ architecture and a custom attention mechanism that allows its models to handle very large context windows while keeping compute demands relatively low compared with many rivals.

However, as is the case with most technological advancements, MiniMax’s expertise did not evolve in a vacuum, and it is now at the center of a technical and political debate that could become an important part of the broader global AI race.

Controversy, competition, and geopolitics

The meteoric growth of MiniMax has raised concerns about the methods it and other Chinese companies use to achieve performance growth while minimizing compute loads.

On 24 February 2026, Anthropic accused MiniMax, DeepSeek, and Moonshot AI of extracting knowledge from its Claude models via large-scale distillation, a common technique in which smaller systems learn from stronger ones.

The dispute centers on scale and intent, with Anthropic alleging that MiniMax used millions of interactions through thousands of fake accounts to replicate Claude’s most valuable capabilities, quickly shifting traffic whenever new models appeared. An example of this was witnessed when MiniMax allegedly redirected nearly half of its traffic to new systems within 24 hours of Anthropic’s release of a new Claude model.

Anthropic alleges “industrial-scale” model distillation by three Chinese AI labs, intensifying AI security and geopolitical tensions. Photo Credit: Financial Times.

While the claims are significant, the debate highlights a murkier reality in the industry, as experts note that the line between acceptable distillation and improper model extraction is often unclear, especially in a sector where data use, scraping, and competitive advantage are already deeply contested. The entire episode has become a center of debate around training data practices, given that Anthropic itself settled a $15B lawsuit with authors and publishers over its own training data practices.

Then there are geopolitical tensions between the U.S. and China, and MiniMax sits right at the center of this exchange. The company reportedly generates over 70% of its revenue overseas and is navigating chip export restrictions and tariffs that could restrict its access to key markets, technologies, and customers.

What MiniMax signals about AI’s future

MiniMax’s trajectory not only reflects the direction AI companies are taking but also shows that companies built around text-only models must now reassess their systems, as rivals using AI that handles images, audio, and video may derive more value from the same information.

Vendor choice is also becoming more complex, since relying on either Chinese or U.S. providers carries geopolitical and regulatory risks depending on where a company operates. At the same time, the economics remain harsh, with even fast-growing firms like MiniMax reporting heavy losses, raising questions about how long low pricing for multimodal AI can last.

The AI industry is entering a phase in which the question is no longer whether models will be multimodal, but rather how quickly the transition occurs and who will bear the cost. Standalone image generators, voice synthesis tools, and video creation platforms face a stark future: integration into larger platforms or extinction.

MiniMax’s story, from a contrarian bet by a 32-year-old SenseTime executive to a $13.7B public company under international scrutiny, captures the speed and volatility of this moment.

The company may thrive as a multimodal infrastructure provider, or it may become an acquisition target for a larger tech ecosystem.

However, regardless of what MiniMax’s future holds, there is no denying that the company has played an important role in shaping the arc of computer intelligence.

The realization, the real-world decisions, the economic viability, and the success of MiniMax in the rapidly evolving AI landscape reflect a truth that can no longer be denied. Intelligence in machines cannot rely solely on text, and when multimodal systems are the closest bet we have to achieving true intelligence.

Quick Bits, No Fluff

  • OpenAI crisis contacts: OpenAI tests ’trusted contacts’ in ChatGPT, letting users nominate people to be alerted when conversations suggest a possible mental health crisis.

  • Qwen leader exits: Alibaba’s Qwen tech lead resigns after a major AI push, spotlighting leadership churn and execution risks in China’s enterprise AI strategy.  

  • iPhone 17e basics: Apple’s new iPhone 17e holds the $599 price, adds MagSafe and 256GB base storage, but keeps a very basic 60Hz display.

The context to prepare for tomorrow, today.

Memorandum merges global headlines, expert commentary, and startup innovations into a single, time-saving digest built for forward-thinking professionals.

Rather than sifting through an endless feed, you get curated content that captures the pulse of the tech world—from Silicon Valley to emerging international hubs. Track upcoming trends, significant funding rounds, and high-level shifts across key sectors, all in one place.

Keep your finger on tomorrow’s possibilities with Memorandum’s concise, impactful coverage.

*This is sponsored content

Thursday Poll

🗳️ How fast will enterprises shift from text-only AI to fully multimodal systems?

Login or Subscribe to participate in polls.

3 Things Worth Trying

  • Runway: Full-stack video and image generation for experiments with multimodal workflows.

  • Pika Labs: A prompt-to-video playground that lets you quickly prototype short multimodal clips.

  • OpenAI GPT with Vision: Use image plus text inputs to test real multimodal use cases in your stack.

The Toolkit

  • Regie: AI-powered sales copilot to draft, personalize, and refine outbound messaging.

  • Replit: In-browser AI coding environment for writing, debugging, and running apps fast.

  • Sourcegraph: A code intelligence layer that lets AI search, understand, and refactor huge codebases.

Rate This Edition

What did you think of today's email?

Login or Subscribe to participate in polls.