Evolving thoughts on web scraping. First, I figured anything goes. Later, I was hesitant to depend on any web site structure that would break, and I wouldn’t work around attempts to stop scraping. Now, I’m back to thinking if you don’t want people to see something, don’t put it on the public web.

there has to be a middle ground here. Sharing something without feeling having an army of people tracking your every moves… remember this ads by Apple?

@numericcitizen For sure. I didn’t explicitly say it but that post was inspired by my frustration with Goodreads. 🙂 There are different but overlapping issues for personal data, too.

@stupendousman I really didn’t mean in that way at all, but I can see how my post was too vague to be meaningful. I was only thinking of web sites that try to “protect” their data despite it being totally public.

The problem with secrets is that once shared, they’re no longer secrets. Luckily it is cryptographically possible to confirm the knowledge of a secret without sharing its contents. Often that’s enough in practice, e.g. to confirm one’s identity without any risk of being doxed. But then governments start to complain, using scare tactics, since they inevitably want to spy on their citizens.

Around and around it goes. There’s several debates like this that rage on in my head over years. Not sure if I’ll ever be able to settle on where to land.

This is so frought. I see all the points of view. Even if I stay inside my own yard, I could be on someone’s security camera feed. If I stay inside my home, I’m still interacting with commercial entities who have records of all kinds of transactions. The privacy we once had is truly changed.

@jarrod I can see this sort of thing for sites for things such as medical,financial,and what not; neither of those would be preferred on search engines, I know I don’t want my financial or medical information available for people without proper authorization to look up and do gods-know-what with, but when it comes to information that you publish yourself on your personal site, without search engines or directories to catalogue these sites…you’re just writing into thin air and potentially no one will ever learn what you have to share. Considering you choose what you publish on your site, I follow the guideline of not publishing things that I wouldn’t want others to see.


@cambridgeport90 Totally. Medical and financial institutions have a responsibility to not leak data out to the open web. That kind of private data should always be locked down behind authorization and as secure as possible.

@pratik @stupendousman Tools should respect robots.txt whenever possible. Micro.blog checks that when archiving copies of bookmarked web pages. To be clear, I wasn’t thinking about personal data at all but instead generic data about things online.

@pratik @stupendousman I’ve said this before, but I think we are overdue for an internet privacy law. Amongst its provisions should be legal consequences for not respecting robots.txt.


@stupendousman I’m not really talking about any mysterious dark web voodoo. There are plenty of privacy violations happening out in the open right now. We need some codified standards, and they should be enforceable.

@stupendousman Also, why should we throw up our hands and say it’s impossible to enforce so why try? Letting problems fester just leads to more problems. Law enforcement has shown it can get just as creative as the dark side when needed.
