Can AI Clean Up Hate?

Plus: AI racism grifts, Square’s AI+Bitcoin stack, and X’s escalating fight with the EU.

Here’s what’s on our plate today:

  • 🧪 Can AI really cut online abuse… or just hide it?

  • 📰 Headlines: AI racism grift, Square’s AI+Bitcoin bet, X vs EU.

  • 📊 Poll: Over-censor, under-enforce, hybrid, or minimal AI moderation?

  • 🛠️ Weekend To-Do: sharpen your abuse philosophy and escalation.

Let’s dive in. No floaties needed…

Launch fast. Design beautifully. Build your startup on Framer.

First impressions matter. With Framer, early-stage founders can launch a beautiful, production-ready site in hours. No dev team, no hassle. Join hundreds of YC-backed startups that launched here and never looked back.

  • One year free: Save $360 with a full year of Framer Pro, free for early-stage startups.

  • No code, no delays: Launch a polished site in hours, not weeks, without hiring developers.

  • Built to grow: Scale your site from MVP to full product with CMS, analytics, and AI localization.

  • Join YC-backed founders: Hundreds of top startups are already building on Framer.

Eligibility: Pre-seed and seed-stage startups, new to Framer.

*This is sponsored content

The Laboratory

Can AI really reduce online abuse?

Avast’s 2025 data shows online victimization has surged to 58.2%, putting new pressure on Meta. Photo Credit: Tech Policy.

The quote “Give a man a mask and he will show you his true face,” often attributed to Oscar Wilde, suggests that people reveal their genuine selves when freed from social expectations.

A mask removes pressure to conform, allowing many to reveal their real character.  In the digital world, the ability to hide real identities behind usernames and email IDs allows some to believe that unsociable behavior will not be penalized, allowing them to indulge in online abuse.

Add to this the dimension of social media, and the problem becomes endemic. However, with the emergence of Artificial Intelligence, some believe it will be easier to curb online abuse, once and for all.

According to a Reuters report, British athletes will soon have AI support in combating online harassment. This support comes in the shape of a partnership between UK Sport and Social Protect.

Under the £300,000 ($400,170) partnership, the government-funded organization, which backs Olympic and Paralympic programs, will provide athletes free access to the platform up to the Los Angeles 2028 Games.

The agreement marks a first in British sport. Social Protect, already used in Australia, scans major platforms like Instagram, Facebook, TikTok, and YouTube for more than two million abusive terms and hides flagged comments automatically. Athletes can also add custom triggers.

While the system does not cover X, which a BBC investigation found accounts for most abuse directed at soccer managers and players, it could be the beginning of an AI-led campaign to curb online hate.

However, it is not just athletes and movie stars who face online abuse.

The scale of online abuse

According to Avast’s Cyberbullying Statistics for 2025, lifetime victimization has jumped from 33.6% in 2016 to 58.2%, with 41% of U.S. adults reporting harassment.

This rise in online abuse is fueling a booming AI moderation market.  Currently valued at $9.65 billion, it is projected to grow at a 13.10% CAGR from 2026 to 2035.

Further, demand for multilingual, AI-driven moderation tools is accelerating adoption worldwide, particularly in emerging markets where local language content is expanding rapidly. By 2035, the market is expected to reach $33.05 billion.

While this explosive growth points to confidence in AI-enabled tools to curb online abuse, it is also reflective of a broader debate centering on whether automated moderation solves the problem or risks new harms like over-censorship and bias.

The rise of AI moderation

Modern AI moderation has moved far beyond basic keyword filters. Current systems use natural language processing, machine learning, and computer vision to flag harmful content, with cloud-based tools now representing about 70% of the market thanks to their scalability and usage-based pricing.

AI systems review content roughly 60 times faster than human moderators and can filter posts before they go live. Multimodal models that analyze text, images, and audio outperform text-only systems, catching violations humans often miss.

Newer tools like GPT-5 and Google’s Gemini 2.5 add stronger reasoning and real-time contextual understanding.

Platforms are also experimenting with prevention-focused design. Instagram’s nudging feature, which encourages users to rethink potentially offensive comments, prompted about half of users to edit or delete their messages.

However, these gains, though helpful, do not help to reduce the burden of moderation and can at times be unreliable or downright wrong.

Where AI falls short

Some models can mislabel positive comments as abusive when they refer to sensitive groups. It has been reported that performance can fall by roughly 30% in non-English languages, and AI struggles with regional slang, sarcasm, and coded expressions.

Studies in online knowledge communities show that users punished by automated systems post less but do not improve quality, while bystanders become more reluctant to participate, raising questions about whether AI moderation reduces harm or simply suppresses engagement.

Another important thing to note is that complete automation remains out of reach and likely unwise. Studies show the strongest moderation systems combine AI for scale and speed with humans for context, cultural awareness, and moral judgment.

However, even hybrid models are not foolproof. Since AI moderation constantly balances over-censorship and under-enforcement. Tight filters risk silencing legitimate speech; loose ones allow abuse through. Experts argue that accuracy metrics obscure the real issue: unfair error distribution that can silence or endanger marginalized users.

Context only complicates matters. AI often misinterprets humor, cultural nuance, or intent, leading to errors such as flagging classical art or historic war photography as explicit content. Despite improvements, results still fall short.

A 2022 probe found Instagram acted on only 10% of abusive messages sent to women, showing how often harmful content slips through.

Even if AI systems were to be made fail-safe, there is the question of free speech.

Content moderation inevitably requires editorial choices about what speech is acceptable, and those choices become more consequential when automated. Nearly half of U.S. voters worry that government regulation of AI content could curb criticism of public officials, and more than a quarter say they would share less online if such rules were imposed.

Critics of AI use in content moderation also warn that government-moderated content risks overreach, especially when subjective categories like hate speech are misapplied.

Studies show people trust moderation decisions more when made by humans than by AI, underscoring skepticism about algorithmic fairness. Calls for transparency continue to grow, even as full disclosure of algorithms could confuse users or expose sensitive business practices.

The chilling effect is measurable. Research finds that users penalized by AI reduce their participation, while bystanders also become more cautious, raising questions about whether AI promotes healthier discourse or suppresses engagement.

Beyond moderation

Some go as far as to argue that AI moderation targets symptoms rather than causes. Researchers say racism and harassment online stem from broader structural issues, warning that trying to control abuse solely at the level of content circulation becomes “a never-ending Whac-A-Mole game.”

The UN Secretary-General’s 2024 report highlights systemic challenges, backlash against women’s rights, the rise of AI, and the growth of the ‘manosphere’ that automated filters cannot fix.

Scholars also note that AI can detect patterns at scale, but human moderators remain essential for context and value-based judgment. The core question becomes one of social norms, not just technological limits: what behavior do we want platforms to promote?

So, while using AI for content moderation has its caveats, without the technology it moderation becomes challenging due to the volume of online content, and the rise in the number of people who need protection now.

Waiting for cultural change means tolerating large-scale harm, and even imperfect AI moderation reduces real-world suffering.

In the end, the question is not only whether AI can reduce online abuse but what the digital mask reveals about us.

Wilde’s idea endures because the internet hands every person a mask that can be worn without consequence. It lifts the pressure to behave as we are expected to and exposes the impulses that surface when social accountability fades. AI may help dim the impact of those impulses, yet it cannot fully change the conditions that allow them to flourish.

Content filters can hide harmful language. Nudges can make people pause. Hybrid systems can ease the burden on moderators and protect users who might otherwise face relentless harassment.

These tools matter because they reduce real harm in real time. But the persistence of abuse reminds us that technology alone cannot reshape the character behind the mask.

The long-term solution will depend on more than detection models or enforcement systems. It will require cultural expectations that elevate responsibility, transparency from platforms, and a collective agreement about the norms we want to uphold. AI can help create safer digital spaces, but only people can decide what those spaces should reveal about who we are when the mask is on.

TL;DR

  • Online abuse is surging, powering a fast-growing AI moderation industry that now scans billions of posts for harmful content.

  • Modern AI filters are faster and more capable, but still miss nuance, struggle in non-English contexts, and can both over-censor and under-enforce.

  • Full automation is a dead end: the best systems combine AI scale with human judgment, yet still risk bias, chilling effects, and eroded trust.

  • AI can blunt some harm in real time, but it cannot fix the social, cultural, and structural forces that keep producing abuse in the first place.

Headlines You Actually Need

Goodies delivered straight into your inbox.

Get the chance to peek inside founders and leaders’ brains and see how they think about going from zero to 1 and beyond.

Join thousands of weekly readers at Google, OpenAI, Stripe, TikTok, Sequoia, and more.

Check all the tools and more here, and outperform the competition.

*This is sponsored content

Friday Poll

🗳️ For platforms you run or use, which trade-off on AI moderation feels least bad?

Login or Subscribe to participate in polls.

Weekend To-Do

  • Write your ‘abuse philosophy’ in 5 sentences: Define what your product considers unacceptable, what must be removed fast, and what’s painful but allowed.

  • Run a bias spot-check on your own system: Take 20 real or synthetic comments in different languages, tones, and groups (including marginalized ones) and push them through your current moderation stack.

  • Add one human circuit breaker: Design a simple escalation rule. Which kinds of cases must go to a trained moderator, and what information (context, history, examples) do they see?

Rate This Edition

What did you think of today's email?

Login or Subscribe to participate in polls.