Belatedly realizing that Reddit’s robots.txt change means that Micro.blog’s bookmark feature now can’t archive a copy of pages, because we check robots.txt. This is the kind of trickle down effect when a site withdraws from the open web, it hurts other services and incentivizes breaking conventions.

Yury Molodtsov

@manton Well, do you even have to follow robots.txt from a legal perspective?

Manton Reece

@yury_mol No, but it seemed like the right thing to do. Now I’m less sure.

Manton Reece

@pratik Mostly because of their size, and because they are making the decision regardless of what the authors of the content may want.

Jon Rozier

I still find the entire situation bizarre…seems like they want to become a walled garden but wouldn’t that also reduce their traffic and (in theory), their profitability… Cutting off their nose to spite their face, or something like that. 🤷🏼

Manton Reece

@lostinhaste I guess they are betting on active users still being engaged, in the same way that Facebook is like a walled garden but very profitable. It does seem short-sighted, though, and generally bad for the web.

Bill Seitz

@manton @yury_mol I think you could justify honoring it everywhere else but Reddit

prealpinux

I totally agree 💯

Dan Wineman

@manton You aren’t crawling pages arbitrarily though, are you? I think it’s different when the archiving is requested directly by a user. @marcoarment said recently that he makes a similar choice for Overcast.

David Lynch

@manton That doesn’t sound like a robot, so I’d say you don’t really need to pay attention to it. `robots.txt` is very explicitly supposed to be consumed by automated crawlers. robotstxt.org/faq/what.html

Manton Reece

@dwineman @marcoarment Oh yeah, that was a good episode. What I’m doing is similar to Instapaper.

Dan Wineman

@manton @marcoarment Yeah, or Pinboard or any number of similar services. I’d be hesitant to automatically follow and archive links from a bookmarked page (other than redirects), but I don’t think what you’re doing is a problem.

Rene van Belzen

Wasn’t robots.txt meant to optimize search results, and nothing more? My guess is that Reddit doesn’t want Google to eat Reddit’s lunch, since Reddit content has been prevailing in search results for a long time now.

Manton Reece @manton
Lightbox Image