AI's impact on the open web

This is an excellent post by Molly White about the potential conflict between making the world’s knowledge more accessible through AI and the risk of destroying the foundations for open content on the web:

The true threat from AI models training on open access material is not that more people may access knowledge thanks to new modalities. It’s that those models may stifle Wikipedia and other free knowledge repositories, benefiting from the labor, money, and care that goes into supporting them while also bleeding them dry.

She also gets at something I tried to articulate in one of my posts last year about putting up roadblocks for crawlers. We don’t want to make the web worse in the process of protecting content from AI training. Molly again:

Often by trying to wall off those considered to be bad actors, people wall off the very people they intended to give access to. People who gate their work behind paywalls likely didn’t set out to create works that only the wealthy could access. People who implement registration walls probably didn’t intend for their work to only be available to those willing to put up with the risk of incessant email spam after they relinquish their personal information.

AI companies are moving so quickly that it’s going to take the open web and standards organization a little time to catch up. It’s not hopeless, though. Personally, I do want all of my blog posts — and the entire content of my book Indie Microblogging — available for AI models. But if other writers feel differently, there should be steps they can take without also taking a step back from the open web.

I believe all these things:

  • AI models with all the world’s information are an incredible resource and will transform education and how we work.
  • AI training should respect how authors intend for their content to be used without forcing authors to mangle their own content.
  • AI companies shouldn’t take from the open web without giving back in citation links and money to authors and organizations.
  • AI slop will become a problem for both users and AI training, so we need a web filled mostly with human-generated content.

I remain optimistic in part because despite how divisive AI has become, this year is also seeing an amazing return to open web principles. More people are blogging. More social networks are based on open protocols. We need to be thoughtful in how we navigate all of this, finding the right balance with AI training that doesn’t undermine what we love about the open web.

Nick Radcliffe

Really good post.

Personally, I agree with your last three stated beliefs, but not the first. I believe AI, (but not LLMs) will one day be an incredible resource and will transform education and how we work.

But chatbots make me more pessimistic about even this weaker version. I hate ‘em.

Manton Reece

@njr We'll have to disagree on the first point, but I get it. It is also still very early in all of this.

Tyler K. Nothing

This may seem like a lateral, but I should point out that robotics has improved significantly over the past two decades, and with the state of capitalism as it is today, AI should not be seen as a positive, but a threat. Capitalists are flogging the hell out of early AI to spur development in hopes of accelerating advancements in two areas, AI models and autonomous robotics. Both of these will likely be used to replace the one thing capitalism cannot control: humans.

It's dark and dystopian and bleak, but taken in context with all of the crazy talk around Mars, the rise of authoritarianism and accelerationists developing a larger following, and humanities increasing dependence on the "algorithm", I think I'm right. I don't want to be right. I really don't.

Samuel Lison :lagr_elephant:

@manton nice article.

#Kagi search with AI actually spits out citations and sources. I believe they are looking into how they can give back to sources too.

But I agree, AI slop is getting worse. What happens when LLMs learn from LLMs learning from LLMs with no human content?

AI content should be clearly labelled as such also, so that LLMs don't further loop their content.

Manton Reece

@tylerknowsnothing I hope you're wrong! I think AI in robots is a mistake. AI can be great but needs human oversight.

Tyler K. Nothing

I hope so, too. Just look at how far it's come in the last few years, and now we're in the firing chamber of the Trump administration and he's just pulled the trigger. Now we're going to be heading down the barrel for four years. We're just 53 days in now and they haven't even warmed up yet. Trump & Co hate regulation and they effectively control all three legs of the American stool which are supposed to provide checks and balances to the others.

I'm not that hopeful, after all.

Mike Gunderloy

We may need a web filled with human-generated content, but I don't see how we can get one. The cost of generating slop appears to be falling asymptotically to zero, and the incentives for generating it are rising as international criminals move to the web.

Manton Reece @manton
Lightbox Image