Belatedly realizing that Reddit’s robots.txt change means that Micro.blog’s bookmark feature now can’t archive a copy of pages, because we check robots.txt. This is the kind of trickle down effect when a site withdraws from the open web, it hurts other services and incentivizes breaking conventions.

@manton Well, do you even have to follow robots.txt from a legal perspective?

@yury_mol No, but it seemed like the right thing to do. Now I’m less sure.

@pratik Mostly because of their size, and because they are making the decision regardless of what the authors of the content may want.

I still find the entire situation bizarre…seems like they want to become a walled garden but wouldn’t that also reduce their traffic and (in theory), their profitability… Cutting off their nose to spite their face, or something like that. 🤷🏼

@lostinhaste I guess they are betting on active users still being engaged, in the same way that Facebook is like a walled garden but very profitable. It does seem short-sighted, though, and generally bad for the web.


@manton You aren’t crawling pages arbitrarily though, are you? I think it’s different when the archiving is requested directly by a user. @marcoarment said recently that he makes a similar choice for Overcast.

@manton That doesn’t sound like a robot, so I’d say you don’t really need to pay attention to it. `robots.txt` is very explicitly supposed to be consumed by automated crawlers. http://www.robotstxt.org/faq/what.html

@dwineman @marcoarment Oh yeah, that was a good episode. What I’m doing is similar to Instapaper.

@manton @marcoarment Yeah, or Pinboard or any number of similar services. I’d be hesitant to automatically follow and archive links from a bookmarked page (other than redirects), but I don’t think what you’re doing is a problem.

Wasn’t robots.txt meant to optimize search results, and nothing more? My guess is that Reddit doesn’t want Google to eat Reddit’s lunch, since Reddit content has been prevailing in search results for a long time now.
