Evolving thoughts on web scraping. First, I figured anything goes. Later, I was hesitant to depend on any web site structure that would break, and I wouldn’t work around attempts to stop scraping. Now, I’m back to thinking if you don’t want people to see something, don’t put it on the public web.

Numeric Citizen

there has to be a middle ground here. Sharing something without feeling having an army of people tracking your every moves… remember this ads by Apple?

Manton Reece

@numericcitizen For sure. I didn’t explicitly say it but that post was inspired by my frustration with Goodreads. 🙂 There are different but overlapping issues for personal data, too.

Manton Reece

@stupendousman I really didn’t mean in that way at all, but I can see how my post was too vague to be meaningful. I was only thinking of web sites that try to “protect” their data despite it being totally public.

Rene van Belzen

The problem with secrets is that once shared, they’re no longer secrets. Luckily it is cryptographically possible to confirm the knowledge of a secret without sharing its contents. Often that’s enough in practice, e.g. to confirm one’s identity without any risk of being doxed. But then governments start to complain, using scare tactics, since they inevitably want to spy on their citizens.

Jarrod Blundy

Around and around it goes. There’s several debates like this that rage on in my head over years. Not sure if I’ll ever be able to settle on where to land.

Hjalmer Duenow

This is so frought. I see all the points of view. Even if I stay inside my own yard, I could be on someone’s security camera feed. If I stay inside my home, I’m still interacting with commercial entities who have records of all kinds of transactions. The privacy we once had is truly changed.

Katherine M. Moss

@jarrod I can see this sort of thing for sites for things such as medical,financial,and what not; neither of those would be preferred on search engines, I know I don’t want my financial or medical information available for people without proper authorization to look up and do gods-know-what with, but when it comes to information that you publish yourself on your personal site, without search engines or directories to catalogue these sites…you’re just writing into thin air and potentially no one will ever learn what you have to share. Considering you choose what you publish on your site, I follow the guideline of not publishing things that I wouldn’t want others to see.

Jason Becker

people have conflated internet technologies with the web. There’s plenty of ways to use the internet to communicate non-publicly. And if you choose instead to be on the web, you should, within reason1, expect it to be public by definition.

  1. Rate limiting, for example, to control costs.

Manton Reece

@cambridgeport90 Totally. Medical and financial institutions have a responsibility to not leak data out to the open web. That kind of private data should always be locked down behind authorization and as secure as possible.

Manton Reece

@pratik @stupendousman Tools should respect robots.txt whenever possible. Micro.blog checks that when archiving copies of bookmarked web pages. To be clear, I wasn’t thinking about personal data at all but instead generic data about things online.

Matt Huyck

@pratik @stupendousman I’ve said this before, but I think we are overdue for an internet privacy law. Amongst its provisions should be legal consequences for not respecting robots.txt.

Randy Botti

@fgtech @pratik interesting thread. I’ve accepted that due be oblivious and naive to the dangers for many years things I put on the web/internet in the early years aren’t going away. In the past decade, I’ve pulled back and been far more thoughtful about what goes up and where.

Matt Huyck

@stupendousman I’m not really talking about any mysterious dark web voodoo. There are plenty of privacy violations happening out in the open right now. We need some codified standards, and they should be enforceable.

Matt Huyck

@stupendousman Also, why should we throw up our hands and say it’s impossible to enforce so why try? Letting problems fester just leads to more problems. Law enforcement has shown it can get just as creative as the dark side when needed.

Manton Reece @manton
Lightbox Image