There are a lot of interesting stats in this Cloudflare blog post about AI bots. I still worry that we might over-correct when blocking LLM training. For example, CCBot is mentioned, but Common Crawl goes back 15+ years and has a variety of non-AI uses.

Greg Morris

excuse my ignorance but what would be the downside of blocking ‘good’ bots?

Presumably search rankings but if you’re not fussed about that what else could happen?

Christopher Wilson

@gregmorris monitoring downtime? That’s the only other thing I can think of.

Manton Reece

@gregmorris Depends on the bot. Could be no direct impact, but I can imagine if a tool doesn’t work, then a user might wonder why it’s broken with just certain blogs.

Manton Reece @manton
Lightbox Image