- Roko's Basilisk
- Posts
- Europe Versus AI Scrapers
Europe Versus AI Scrapers
Plus: TikTok’s Oracle move, Tesla bots, and ChatGPT Health.
Here’s what’s on our plate today:
🧪 The Laboratory: EU’s plan to tame AI web scraping.
🧠 Brain Snack: Builders prepping for opt-out web standards & audits.
⚡ Quick Bits: TikTok’s Oracle pivot, Tesla robots, ChatGPT Health privacy.
📊 Poll: Should scraping without explicit consent be outright banned?
Let’s dive in. No floaties needed…

Introducing the first AI-native CRM
Connect your email, and you’ll instantly get a CRM with enriched customer insights and a platform that grows with your business.
With AI at the core, Attio lets you:
Prospect and route leads with research agents
Get real-time insights during customer calls
Build powerful automations for your complex workflows
Join industry leaders like Granola, Taskrabbit, Flatfile and more.
*This is sponsored content

The Laboratory
How the EU could put an end to web scraping
In many industries, those who extract raw materials capture the smallest share of the value created along the supply chain.
Look at the coffee industry. Coffee farmers often receive 5–10% of the final retail price of a cup of coffee, while most of the value is captured by roasters, brands, distributors, and cafés that process, market, and sell it.
This means that processing is where the majority of value is generated.
While it is easy to see how processing adds value to raw materials in the coffee industry, in the world of tech, things are a lot more complicated. Especially when it comes to understanding the raw materials needed to successfully train and deploy an AI model.
The end of the wild west: From TDM Exceptions to rigid enforcement
For nearly a decade, AI companies and the open web have operated under an unspoken truce: don’t ask, don’t tell. Under the EU’s 2019 copyright rules, scraping data was effectively allowed unless a creator explicitly said “no.” That worked when models were small. It doesn’t anymore. When trillion-parameter models ingest the entire web, ‘asking them to stop’ becomes meaningless for most creators.
That is what the European Commission is now trying to fix. They are currently finalizing a landmark consultation process (concluding January 2026) to define what constitutes a machine-readable opt-out under the EU AI Act.
The consultation, meant to align copyright law with the EU AI Act, signals the end of the gray area.
By early 2026, the debate is no longer about whether creators can opt out, but about how they can do so. The stakes now go beyond copyright. If an AI company ignores a recognized do-not-train signal, the Commission increasingly views this not just as infringement, but as a failure of transparency and governance under the AI Act.
The technical tussle: Why robots.txt isn't enough
For years, robots.txt functioned as the internet’s gentleman’s agreement. It told search engines which pages to avoid, and mostly, they listened. But AI training is different. Search engines send users back to websites; large language models often replace the need to visit them at all.
That’s why new, AI-specific standards are emerging. The Commission is reviewing proposals like llms.txt, which provides model-readable summaries, and ai.txt, which lets publishers distinguish between training and inference use.
The challenge is choosing something that’s simple enough for a small local paper, yet robust enough for an AI lab scraping the entire internet.
Why companies are taking this seriously
For AI firms, this consultation is not academic. Under the AI Act, penalties for failing transparency requirements can reach 15 million euros or 3% of global annual turnover by 2026. For large technology companies, that is a material financial risk.
More severe is the possibility of forced remediation. If regulators determine that a model was trained on data collected in violation of valid opt-out signals, they may require that model to be withdrawn or retrained.
That could erase hundreds of millions of dollars in computing and research investment. As a result, data provenance, meaning knowing exactly where training data came from, has become a board-level concern rather than a back-office technical issue.
Europe versus the United States
This regulatory approach also widens the gap between the EU and the U.S. While American courts continue to debate whether AI training qualifies as fair use, Europe has chosen to resolve the issue through regulation. This creates a strong Brussels Effect.
If a company wants to operate in the European market, which is the world’s largest unified digital economy, its crawlers must respect the EU opt-out registry everywhere, not just on European domains.
As noted by the IAPP, this effectively exports European copyright sensibilities to the rest of the world. A global publisher in New York or Tokyo can now use an EU-recognized header to protect their work from being scraped by a Silicon Valley lab, knowing that the lab must comply to maintain its European license.
The risk for small creators
One major concern remains unresolved. Complexity. Independent artists, writers, and small publishers worry that machine-readable opt-outs will favor large organizations with legal and technical teams. Managing headers, registries, or evolving standards requires skills many creators do not have.
Critics warn this could lead to a two-tier internet. One tier where large media companies are fully protected, and another where smaller creators remain exposed by default. The Commission’s final challenge is to ensure that machine-readable protection does not become a barrier to entry, and that safeguarding creative rights does not require creators to become engineers.
Democratizing AI gains
AI is quickly becoming one of the most powerful economic forces of this century. It is already reshaping labor, capital allocation, and competitive advantage across industries. Until now, the dominant assumption has been that AI itself is the product, and everyone else simply adapts to it.
But AI is not created in a vacuum. Its most critical input is human-generated data. Text, images, code, and ideas were produced long before any model existed. That data has been treated as free because it was easy to take, not because it lacked value.
As models grow larger and more data hungry, that imbalance becomes impossible to ignore. Regulation is now stepping in where norms failed. The EU’s approach forces a simple question back into the system. Who decides how value is extracted, and who gets paid for it?
If done right, this shift does not end AI innovation. It corrects it. It gives creators a real choice: participate and be compensated, or opt out entirely. Unlike the coffee farmer at the bottom of the value chain, creators are no longer silent suppliers. They become economic actors again.


Quick Bits, No Fluff
TikTok Oracle Pivot: ByteDance is reportedly weighing a U.S. venture with Oracle hosting TikTok’s data and algorithms to dodge a full ban, but key control questions remain unresolved.
Musk’s Robot Priority: Elon Musk told analysts that Tesla’s future value lies more in its Optimus humanoid robots than in its cars, hinting at a long-term pivot beyond EVs.
ChatGPT Health Sharing: OpenAI’s new ChatGPT Health feature is already raising privacy alarms over how deeply it connects to medical records, insurers, and third-party health data systems.

The context to prepare for tomorrow, today.
Memorandum merges global headlines, expert commentary, and startup innovations into a single, time-saving digest built for forward-thinking professionals.
Rather than sifting through an endless feed, you get curated content that captures the pulse of the tech world—from Silicon Valley to emerging international hubs. Track upcoming trends, significant funding rounds, and high-level shifts across key sectors, all in one place.
Keep your finger on tomorrow’s possibilities with Memorandum’s concise, impactful coverage.
*This is sponsored content

Brain Snack (for Builders)
![]() | Ship opt-outs like contracts.If you crawl or fine-tune on web data, treat robots.txt / ai.txt / llms.txt as hard constraints, log them, enforce them, and prove compliance now before an AU auditor asks. |

Wednesday Poll
🗳️ How far should EU go on AI web scraping? |
Meme of the Day

Rate This Edition
What did you think of today's email? |






