This is an excellent post by Molly White about the potential conflict between making the world’s knowledge more accessible through AI and the risk of destroying the foundations for open content on the web:
The true threat from AI models training on open access material is not that more people may access knowledge thanks to new modalities. It’s that those models may stifle Wikipedia and other free knowledge repositories, benefiting from the labor, money, and care that goes into supporting them while also bleeding them dry.
She also gets at something I tried to articulate in one of my posts last year about putting up roadblocks for crawlers. We don’t want to make the web worse in the process of protecting content from AI training. Molly again:
Often by trying to wall off those considered to be bad actors, people wall off the very people they intended to give access to. People who gate their work behind paywalls likely didn’t set out to create works that only the wealthy could access. People who implement registration walls probably didn’t intend for their work to only be available to those willing to put up with the risk of incessant email spam after they relinquish their personal information.
AI companies are moving so quickly that it’s going to take the open web and standards organization a little time to catch up. It’s not hopeless, though. Personally, I do want all of my blog posts — and the entire content of my book Indie Microblogging — available for AI models. But if other writers feel differently, there should be steps they can take without also taking a step back from the open web.
I believe all these things:
- AI models with all the world’s information are an incredible resource and will transform education and how we work.
- AI training should respect how authors intend for their content to be used without forcing authors to mangle their own content.
- AI companies shouldn’t take from the open web without giving back in citation links and money to authors and organizations.
- AI slop will become a problem for both users and AI training, so we need a web filled mostly with human-generated content.
I remain optimistic in part because despite how divisive AI has become, this year is also seeing an amazing return to open web principles. More people are blogging. More social networks are based on open protocols. We need to be thoughtful in how we navigate all of this, finding the right balance with AI training that doesn’t undermine what we love about the open web.