- Roko's Basilisk
- Posts
- When AI Directs Video
When AI Directs Video
Plus: Wishlist privacy leak, creepy smart glasses, and Xbox bets on cloud.
Here’s what’s on our plate today:
🧪 HeyGen and how AI video reshapes visual storytelling costs.
🧠 Amazon wishlist privacy bug, creepy smart glasses, Xbox cloud future.
💡 Roko’s Pro Tip on testing AI video before studio shoots.
📊 Monday Poll on how comfortable you are with AI video.
Let’s dive in. No floaties needed…

Deploy Models That Work in Production
This is the kind of talent you get with Athyna Intelligence: Research Engineer with deep expertise in PyTorch, deep learning, and LLM workflows. Built to ship models that hold up under real-world conditions.
Applied ML at scale
Fast iteration on real-world constraints
Production-first mindset
Part of our vetted LATAM PhD and Master’s network, working in U.S.-aligned time zones.
*This is sponsored content

The Laboratory
How generative AI is lowering the barrier for visual storytelling
Beyond audio and text, one of the most powerful media for telling stories is the visual medium. Before the invention of camera screens, people relied on visual methods such as puppet shows and stage shows, which had physical limitations that, in turn, constrained how a story could be told.
With technology and the evolution of special effects in cinema and animation, filmmakers learned to bend reality. From mechanical tricks and hand-crafted illusions to physics simulations and AI-assisted imagery, each technological leap expanded what stories could be told and how audiences could experience them.
From special effects to generative media
Today, companies like HeyGen are pushing the boundaries of visual storytelling with software, creating content without expensive cameras, studio crews, or on-camera talent.
On the surface, HeyGen is a platform that enables end users and enterprises to reduce video production costs through its video translation capability.
What HeyGen actually does
To better understand what the company does, consider a marketing manager at a company with customers in 15 countries who writes a two-paragraph product update script. She selects a digital presenter from HeyGen’s library of 230-plus AI avatars, pastes her script, and clicks generate. Within minutes, she has a polished video featuring a photorealistic human presenter delivering her content in perfect synchronization with her spoken words.
Beyond translation: avatars and video agents
Once the video is generated, they can use the translate feature to render the same video in 15 languages, with the presenter’s lip movements adjusted in each version to match the translated audio, as if the presenter were a native speaker of each language. All this can be done without additional filming, which could add weeks to production time and thousands of dollars in costs.
Beyond video translation, HeyGen also develops tools for custom avatar creation (users submit a two-minute consent video and receive a digital twin of themselves), Interactive Avatar (a real-time streaming product for conversational AI presenters used in customer support, training, and sales), and Video Agent, which generates a complete, polished video from a single text prompt by autonomously handling scripting, avatar selection, B-roll sourcing from Google Veo and OpenAI Sora, voiceover, and editing.
Growth, revenue, and profitability
The influence of companies like HeyGen can be better understood by examining the numbers. The company reached $1M in annual recurring revenue in 178 days from its July 2022 product launch. By October 2023, that number had reached $10M. By June 2024, it crossed $35M, and the company raised a $60M Series A led by Benchmark at a $500M valuation. By September 2025, Sacra estimated ARR at $95M, serving more than 85,000 customers globally, including Zoom, SAP, and Reuters. Critically, HeyGen reached profitability in Q2 2023 and has maintained it since, which is rare for a company growing this aggressively.
The company’s rapid growth is best understood by examining the timing of its launch and its ability to generate viral content for mass consumption. The company entered a market where video was already central to enterprise communication, yet producing professional content was still slow and expensive. Its timing coincided with AI avatars becoming visually credible, which meant customers were not experimenting with novelty but replacing real production costs and bottlenecks.
Another important factor was the localization of a practical use case. While the need for multilingual video long predated HeyGen, its lip-sync translation transformed what had been a resource-intensive production effort into a simple workflow step. This shift directly aligned with the needs of global marketing, learning, and communications teams. Additionally, the company sustained product velocity through frequent releases that expanded capabilities while keeping pace with rapid competitive changes, turning execution speed into a form of defensibility.
Add to this the company’s ability to integrate with platforms like HubSpot, and HeyGen moved from a standalone tool to an infrastructure that operates within enterprise workflows, reducing friction and increasing retention.
For the overall market, HeyGen, then, has shown the way for application-layering companies to achieve growth while maintaining profitability.
And the competition is following quickly and closely.
Competition and platform pressure
HeyGen’s closest competitor, Synthesia, holds a $2.1B valuation, an estimated $146M in ARR, and relationships with 70% of Fortune 100 companies. The company has built a more rigorous governance stack to ensure enterprise compliance and tighter control over avatar consent, supporting enterprise growth.
Then there are Adobe, Canva, and Google, all of which are working to democratize video generation capabilities. Adobe is integrating Firefly Video into Premiere Pro, Google is weaving Veo into Workspace Vids and YouTube, and Canva is including AI video in its low-cost Pro plan to reach a massive user base.
This shifts AI video from a separate purchase decision to a bundled capability, reducing switching costs to near zero and allowing incumbents to fund video features with broader platform revenues. Even Adobe’s reported interest in acquiring Synthesia suggests that large players view AI avatar video as an enhancement to existing suites rather than a standalone category commanding premium valuations.
Risks, regulation, and the governance challenge
These developments not only fuel competition but also push companies to improve their capabilities, ensuring that video and avatar generation continue to advance and that barriers to visual storytelling continue to crumble.
The industry as a whole faces difficult questions, including how to curb the misuse of generative AI capabilities for corporate fraud, political impersonation, and identity theft. The industry continues to invest in trust and safety teams, focusing on improving the ethics framework, consent requirements for custom avatar creation, and human moderation of flagged content.
The industry also has to contend with and invest in compliance with the shifting regulatory environment.
The TAKE IT DOWN Act, signed into federal law on 19 May 2025, criminalizes non-consensual AI-generated intimate imagery and requires platforms to remove flagged content within 48 hours. The NO FAKES Act, pending in the Senate, would create a federal right against the unauthorized use of AI replicas of any person’s voice or likeness. The EU AI Act’s Article 50 imposes mandatory disclosure requirements on deployers of deepfake-capable AI systems, with fines of up to 6% of global turnover for serious violations.
As of December 2025, 46 U.S. states have enacted legislation specific to deepfakes. In 2025 alone, 146 state-level bills addressed synthetic media. New York’s synthetic performer disclosure law, signed on 11 December 2025, requires advertisers to conspicuously disclose when AI-generated avatars appear in ads, effective June 2026.
Then there are challenges posed by efficiency gains that could eliminate jobs in the professional video production industry. These are questions that can only be answered as the long-term strategies and development of companies like HeyGen become clearer.
HeyGen has built a real business at genuinely impressive speed. It serves real enterprise buyers solving real problems. It has also built a tool whose misuse potential is growing faster than its safeguards can keep pace. The next 24 months will test whether it can close the enterprise governance gap, defend against platform incumbents who view its core product as a feature, and satisfy regulators who are, for the first time, writing laws fast enough actually to matter.
In the meantime, HeyGen represents both the transformative and disruptive power of AI.


Roko Pro Tip
![]() | 💡 If you experiment with AI avatar video, start where the stakes are lowest, think training or internal updates, and bake in clear disclosure and approvals before you ever touch customer-facing content. |

Build your startup on Framer—Launch fast. Design beautifully.
First impressions matter. With Framer, early-stage founders can launch a beautiful, production-ready site in hours. No dev team, no hassle. Join hundreds of YC-backed startups that launched here and never looked back.
One year free: Save $360 with a full year of Framer Pro, free for early-stage startups.
No code, no delays: Launch a polished site in hours, not weeks, without hiring developers.
Built to grow: Scale your site from MVP to full product with CMS, analytics, and AI localization.
Join YC-backed founders: Hundreds of top startups are already building on Framer.
Eligibility: Pre-seed and seed-stage startups, new to Framer.
*This is sponsored content

Monday Poll
🗳️ How comfortable would you be using AI avatar video in your company’s external comms? |

Bite-Sized Brains
Amazon wishlists leak addresses: Third-party sellers can see cities, states, ZIP codes, and recipient names on private wishlists, raising serious privacy and safety concerns for vulnerable users.
App spots smart glasses: A new research app called Nearby Glasses can detect and identify smart glasses over Bluetooth, letting bystanders see when nearby wearables might be recording.
Xbox future on the line: Tom Warren dissects Microsoft’s big Xbox shakeup, Phil Spencer’s exit, new leadership, and whether cloud gaming and Game Pass can still rescue a struggling console business.
Meme Of The Day

The Toolkit
Deepgram: Speech recognition engine that turns calls and recordings into accurate transcripts and searchable voice data.
Descript: An AI video and audio editor that lets you edit by tweaking the transcript instead of a timeline.
Drift: This AI chat platform turns website visitors into qualified conversations and booked meetings in real time.

Rate This Edition
What did you think of today's email? |





