Roko's Basilisk
Posts
From Falling Glass To Robots

From Falling Glass To Robots

Plus: iOS 27's practical AI, The Atlantic exposes AI training data, China's green AI hurdles.

Roko's Basilisk
June 23, 2026

Here’s what’s on our plate today:

🧪 Why AI video models are becoming a focus of robotics research.
📰 iOS 27's practical AI, The Atlantic exposes AI training data, and China's green AI hurdles.
💬 Prompt of the Day: break down renderers, simulators, and planners in world-model research.
🗳️ Poll: Is predicting the next frame a path to understanding the world?

Let’s dive in. No floaties needed…

Build and design your website on Framer - Now with Agents

Framer is a pro website builder trusted by companies like Miro and Perplexity, helping creators, teams, and businesses ship production-ready sites faster than ever. With AI agents built directly into the canvas, teams can design pages, manage CMS content, write copy, add SEO, and audit for issues — all without leaving the tool where the real site lives. Agents bring speed and scale; you bring taste, judgment, and control.

_{*This is sponsored content}

The Laboratory

TL;DR

Frames are physics: Video models infer gravity, acceleration, and shattering from footage, not equations, then reason about consequences.
Reka's pivot: The chatbot maker absorbed Moonvalley's ex-DeepMind Veo team and rebranded around physical intelligence, building a World Language Action Model.
Everyone's converging: Fei-Fei Li raised $1B for World Labs, Yann LeCun left Meta to chase physics-first AI, and Runway shipped a world model. Different starts, one destination.
Skeptics call it loose: Li's own team splits these systems into renderers, simulators, and planners, arguing most so-called world models just generate pixels.
The stakes, the sim-to-real gap: If video AI survives messy reality, chatbots become an early chapter; if not, the pivot is a bridge to nowhere.

Why AI video models are becoming a focus of robotics research

A glass perched near the edge of a table rarely inspires much thought, yet almost anyone watching it begin to tip can predict what will happen next. Long before it reaches the floor, the mind has already anticipated the outcome, drawing on an intuitive understanding of how objects move, collide, and respond to gravity, built over years of observing the physical world.

A growing number of AI companies, including Reka, are betting that the technology behind video generation could become the foundation for machines that understand and navigate the physical world. Photo Credit: Tech Funding News.

That seemingly effortless ability to anticipate physical events is increasingly relevant to how researchers think about artificial intelligence. Evidence suggests that AI systems exposed to vast amounts of video may begin developing their own internal representations of the physical world by absorbing recurring patterns of motion, cause-and-effect, and object behavior. Rather than learning physics through equations, these models learn through observation, identifying regularities that allow them to anticipate what is likely to happen next.

What makes this particularly significant is that the same predictive capabilities that help AI generate realistic video may also help it navigate and interact with real-world environments, increasingly blurring the line between video-generation systems and the foundations of physical intelligence.

Why a falling glass matters to AI

The growing overlap between video generation and physical intelligence is no longer confined to academic research. One of the clearest indications emerged on June 11, when San Francisco-based Reka, previously known for building efficient enterprise chatbots, announced that it had absorbed the research team behind Moonvalley. Among the new arrivals are former Google DeepMind researchers Mateusz Malinowski and Mikołaj Bińkowski, who helped develop the technology behind Google's Veo video generator. Shortly after the acquisition, Reka stopped describing itself primarily as a chatbot company and instead began presenting itself as a lab focused on building foundational intelligence for the physical world.

From a distance, the pivot may appear somewhat odd since companies that have built their image on making chatbots tend to compete in their area of expertise. They do not hire video researchers and suddenly start talking about robots and physical AI.

However, in Reka's case, the connection is more logical than it appears, and to see why, it helps to understand what an AI video model actually does when generating a video.

When a user asks an AI video model to generate a clip of a glass falling from the edge of a table, the system is not retrieving a stored video. It is predicting, frame by frame, what should happen next. To make the sequence look convincing, the model must learn something real about how objects behave, including that gravity pulls things downward, that objects accelerate as they fall, and that a glass striking a hard floor is more likely to shatter than bounce. Nobody explicitly programmed those rules. The model inferred them from observing vast amounts of video, much as humans develop intuition about the physical world through experience.

The model's ability to infer how objects behave in the physical world, trained on large video datasets, has implications far beyond video generation. A model that can predict how a scene will evolve can also begin reasoning about the consequences of actions within that scene. Rather than simply rendering a falling glass, it can simulate what is likely to happen if someone pushes it in the first place. As researchers increasingly recognize, the ability to predict the next frame in a video and the next moment in the physical world are closely related, helping to explain why advances in AI video generation are beginning to attract attention from those working on robotics and physical intelligence.

That is why the Reka move is not a costume change. The researchers it hired are the people who taught machines to predict what comes next on a screen, and Reka wants them to apply that same skill in the real world.

Reka's bet on physical intelligence

According to the company's own announcement, Malinowski framed the goal as moving beyond generating video to understanding how the physical world works, simulating motion and physics so that machines can reason about consequences before they act. The project even has a name, a World Language Action Model, trained partly on footage shot from a person's own point of view and on recordings of robots moving through space.

What makes this understanding more concrete is that it is not just Reka making the connection and transitioning accordingly.

Across the field, researchers are arriving at this very conclusion and making moves that reflect the growing prevalence of this thinking.

Different companies, same destination

Fei-Fei Li, one of the most respected figures in AI, raised $1B in February for her company World Labs, which is chasing what she calls spatial intelligence, machines that grasp three-dimensional space the way people do.

Yann LeCun, who spent more than a decade as Meta's chief AI scientist, left to start his own lab built around the same conviction, that systems trained only on text have a ceiling and that real intelligence has to understand physics.

Runway, a video company, released its first world model in December and began discussing using it in fields like medicine and energy, not just movies. While the starting points for these companies were different, their destination is shared.

Where theory collides with reality

However, there is considerable distance between finding the connection and actually making it work in the real world.

The physical world is filled with poor lighting, unexpected obstacles, shifting conditions, and countless small variations that no training dataset can fully capture. Researchers understand this challenge and often describe it as the "sim-to-real" gap, a term that refers to the tendency of systems that perform impressively in controlled simulations to struggle when exposed to the unpredictability of reality.

Not everyone in the field agrees that video-trained AI models are on a direct path toward understanding the physical world. Some researchers argue that the term "world model" is being used too loosely, grouping systems that perform very different tasks.

In a June 3 essay, Fei-Fei Li and the World Labs team proposed a framework that separates AI systems into renderers, simulators, and planners. They argue that many of the systems currently described as world models are primarily renderers, meaning they generate realistic images and videos without necessarily modeling the deeper structure of the world those images depict.

The debate reflects a broader lack of consensus across AI research. A recent survey of world-model research found that the term is used differently in robotics, reinforcement learning, video generation, and multimodal AI, often referring to related but distinct capabilities.

Yet the central question remains unchanged. Can the capabilities that enable modern AI systems to generate increasingly realistic videos eventually be extended to systems that understand, predict, and interact with the physical world? That is what makes Reka's recent hiring spree noteworthy. The researchers it recruited helped push the boundaries of AI video generation, a field that has become remarkably good at modeling the physical world. Whether that expertise can be translated into systems that reliably understand and act in real environments remains uncertain, but the industry is now actively trying to answer that question.

The next chapter of AI

Viewed through that lens, the significance of Reka's seemingly routine corporate announcement extends beyond a single startup. For the past few years, the public face of AI has largely been the chatbot, a system designed to read, write, and converse. Increasingly, however, some of the industry's most prominent researchers and companies are pursuing a different ambition: building systems that can develop an understanding of the physical world and eventually operate within it. The growing interest in world models reflects the belief that learning to predict how objects, people, and environments change over time may be a necessary step toward that goal.

Whether that belief proves correct remains uncertain. The coming years will reveal whether the capabilities emerging from video generation research can overcome the practical challenges that have long limited robotics and embodied AI. If they can, today's chatbots may come to look like an early chapter in a much larger story. If they cannot, the recent wave of companies repositioning themselves around physical intelligence may ultimately be remembered as an ambitious attempt to build a bridge that never quite reached the other side.

Tuesday Poll

🗳️ Video AI and physical intelligence are converging. Is predicting the next frame really a path to understanding the world?

Outperform the competition.

Business is hard. And sometimes you don’t really have the necessary tools to be great in your job. Well, Open Source CEO is here to change that.

Tools & resources, ranging from playbooks, databases, courses, and more.
Deep dives on famous visionary leaders.
Interviews with entrepreneurs and playbook breakdowns.

Are you ready to see what’s all about?