Tantek Çelik proposes a “CC-NT” license, for “no-training”:
This seems like an obvious thing to me. If you can write a license that forbids “commercial use”, then you should be able to write a license that forbids use in “training models”, which respectful / well-written crawlers should (hopefully) respect, in as much as they respect existing CC licenses.
I like this. There are fair use and copyright issues to sort out in the courts, but in the meantime we should be using robots.txt and Creative Commons wherever possible. On my blog, I allow any crawling and any use with attribution. Others might prefer to block AI bots and restrict to non-commercial use, or even allow commercial use but not for AI training.
There was a great episode of Decoder last week with The Browser Company’s Josh Miller. Nilay Patel and Josh talk about the open web, browsers of course, and AI. One comment near the end from Nilay stood out to me, where he said AI training gives “nothing” back to writers on the web.
Wait, nothing? Integrating my blog posts into a model with essentially all the world’s information, so that people can ask it questions and have my writing also included with the answers… That’s “nothing”? Personally, I don’t make money directly from my blog. There are countless benefits to blogging. In the age of AI, one of those benefits is now letting me contribute in a small way to something bigger, in the same way that someone finds an answer in one of my blog posts when they search on Google.
The trade-off is different for everyone. Subscription and ad-based publishers are rightly concerned. They should make deals with AI companies, or in some cases block bots outright. Some people will block or use CC-NT on principle alone. No problem. For me, I hope my writing reaches as far as it can, and so letting it get slurped up by our future AI overlords is not just acceptable, I want it to happen. It’s not nothing.