No More Free Data

Plus: Google expands AI languages, Motion raises $38M, and Recraft for workflows.

Here’s what’s on our plate today:

  • 🧠 Why Anthropic’s $1.5B settlement could mark the end of AI’s free-for-all era.

  • ⚡ Google’s language update, Motion’s raise, and Starlink’s new play.

  • 🧪 A smarter Memrise, Qodo’s code QA, and Recraft for designers.

  • 📊 Should AI companies be forced to license training data?

Let’s dive in. No floaties needed…

The guide to global accounting hiring.

Is the accounting talent shortage slowing your team down?

The US Accounting Talent Shortage report explains why hiring CPAs is harder than ever—and what you can do about it.

Learn why over 300,000 accountants have left the profession, and how that’s putting your finance team at risk.

Discover why firms are hiring in Latin America to fill critical roles with top-tier, bilingual accountants.

Download the report and protect your finance operations.

*This is sponsored content

The Laboratory

What Anthropic’s $1.5 billion settlement means for the industry

Ever since generative AI tools entered public consciousness in late 2022, the data used to train them has been a topic of interest. As more generative AI models became available, the impact of training data on model behavior further piqued interest in the composition of datasets used to train models, how they impact model behavior, and where this data comes from.

The sourcing of this data was of special importance since data scraped from the internet included books, journalism, code, music, and art, and some of it might be covered under copyright laws. While most AI companies contend that scraping data from the internet constitutes fair use, some do not agree with this premise. This soon led to questions around whether it was fair use for AI companies to scrape data without permission and if copyright holders were entitled to compensation.

While authors, artists, publishers, and media companies began demanding compensation for the use of their works in datasets, AI companies were faced with several lawsuits, including ones against OpenAI, Anthropic, and Apple. These lawsuits accused the companies of using copyrighted materials to train their AI systems without seeking prior approval or compensation.

The struggle between online publishers and AI companies has now reached a new turn, with Anthropic agreeing to pay $1.5 billion to settle a class-action lawsuit. However, before we begin to understand its implications, it is important to understand the lawsuit and why Anthropic chose the path of settling.

The settlement with Anthropic marks the end of a year-long legal battle over its use of copyrighted material in training AI models. The dispute began in August 2024, when authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson accused the company of “building a multibillion-dollar business” by using hundreds of thousands of copyrighted books without permission.

In the early stages, Anthropic won a partial victory: a federal judge ruled that training its AI on legally purchased books fell under fair use. However, the court also ruled that Anthropic must face trial over claims it had relied on pirated books from shadow libraries such as Library Genesis.

The case escalated in July 2025, when a California federal judge allowed the authors to pursue a class action lawsuit on behalf of all U.S. writers whose work was allegedly used without authorization.

By late August 2025, Anthropic agreed to settle the class action, with the details emerging only later. In a statement to The Verge, Aparna Sridhar, Anthropic’s deputy general counsel, said the June court ruling had already confirmed that the company’s training methods based on legally acquired books constituted fair use. The settlement, she explained, is meant to resolve the remaining claims tied to pirated material, allowing Anthropic to move forward while continuing to emphasize its commitment to developing safe and beneficial AI systems.

Why the settlement matters

The settlement between Anthropic and authors could spell the beginning of a new era of transparency in AI training datasets and fits into the broader movement of AI companies paying for data used to train models. As per the settlement, the $1.5 billion settlement fund amounts to $3,000 for 500,000 downloaded books, and it could grow if more works are identified.

The settlement marks the first real closure of a copyright-related AI case and could shape future dealings between AI companies and artists by setting a benchmark.

According to Reuters, Anthropic, as part of the settlement, said it will destroy downloaded copies of books the authors accused it of pirating, and under the deal, it could still face infringement claims related to material produced by the company's AI models.

In simple terms, the company will have to legally acquire copyrighted material if it wishes to use it to train its models. So, while the settlement does not set a legal precedent, it pressures the industry to license, document, or abstain from using copyrighted works. Which in itself is a big win for writers, artists, and publishers.

What does it mean for AI companies?

For AI companies, the settlement is a clear indication that they will have to maintain an audit trail of where your data came from, use vetted licenses, and lock down archives. No more ‘mystery mirrors’ on random servers.

The settlement does not force model changes or untraining. The parties explicitly said it won’t directly affect Anthropic’s products, so quality and user experience shouldn’t shift because of this case alone. However, it could prompt AI companies to buy books in bulk and scan them to build a lawful corpus, which, though slower and more expensive, would be legally safer.

According to an AP report, Anthropic had already begun buying books in bulk, but because those purchases happened after pirated versions were used for training, the company was not absolved of liability. This may not be the case in the future, and AI companies may resort to bulk purchasing of books and other data to ensure they do not rely on pirated copies.

For AI companies, there is also the choice of signing deals with publishers. The deal between The New York Times and Amazon could be taken as a blueprint for future deals. The deal came after the news publisher sued Microsoft and OpenAI for using its content to train automated chatbots without any kind of compensation. OpenAI and Microsoft rejected those accusations.

However, while larger publishers can strike deals more easily, individual writers may find it far harder. As such, AI companies will have to look for ways to either acquire their works legally or seek prior permission before using them to train models. Meanwhile, the brunt of the struggle to determine the true value of artistic work for use in training models may be borne by small businesses.

What does it mean for enterprise users?

Small and medium enterprises have been eyeing AI to reduce costs, but rising model training expenses could drive up API prices and make adoption more costly.

Small businesses will also have to ensure they use vendors that document data sources, shy away from unlawful procurement of data, and publish clear governance commitments.

Anthropic already offers a copyright indemnity to commercial customers, and such practices may become more common as lawsuits continue to mount.

The lawsuit against Anthropic and the settlement have made one thing clear. While the judiciary may not be willing to force AI companies to pay for the data used in training models, they have signalled that how this data was procured matters. Training on legally obtained works may be fair use, but building your corpus from pirate sites is not.

It does not settle questions around output ‘regurgitation’, synthetic data based on training data, and mass scraping of paywalled material, but after this deal, the industry will examine how data is procured more closely. Going forward, AI companies may need to license data, verify its legality, and document sources to reduce exposure to litigation.

Looking ahead, the Anthropic settlement may serve less as a conclusion and more as a starting point for a wave of negotiations between AI companies and rights holders. If anything, it signals that the ‘wild west’ era of unlicensed data scraping is nearing its end. AI developers will increasingly be forced to weigh the legal, financial, and reputational costs of using unvetted datasets against the slower, more deliberate path of licensed data collection.

As regulations catch up, the Anthropic lawsuit could propel regulators to take a closer look at data procurement. This may shift focus away from the AI race, pivoting it towards compliance and fair compensation for artists who make generative AI’s creativity possible.

Wednesday Poll

🗳️ Should AI companies be forced to license training data?

Login or Subscribe to participate in polls.

The end of overpriced software.

AppSumo is where entrepreneurs save big on the software they actually need to grow.

Since 2010, we’ve helped scrappy teams, solopreneurs, and side hustlers ditch the endless subscriptions—our lifetime deals mean you pay once and own it forever.

And with a 60-day money-back guarantee, every deal is risk-free. Because building your business shouldn’t break the bank.

*This is sponsored content

Quick Bits, No Fluff

3 Things Worth Trying

  • Memrise (“Membot”): A language-learning platform featuring a GPT‑3–powered AI conversation partner that helps learners practice speaking with human-like interaction. It supports over 35 languages.  

  •  Qodo (formerly Codium): An AI tool focused on code integrity—Qodo helps with code generation, testing, and quality control, boosting confidence in your development process. Scored a $40M Series A just last year.

  • Recraft: A powerful text-to-image AI tailored for professional design workflows. Recraft excels at brand consistency, text clarity, and layout control—trusted by designers for refined, real-world-ready outputs.

Meme Of The Day

Rate This Edition

What did you think of today's email?

Login or Subscribe to participate in polls.