Accessing Publicly Available Information on the Internet Is Not a Crime

us0r · on Dec 14, 2017

What makes this extra ridiculous is the fact LinkedIn built its business on scraping not publicly available information but private address books of unsuspecting users.

jacquesm · on Dec 14, 2017

And spamming those contacts with requests to join that looked as if they originated from your business relations when that definitely wasn't the case.

bhhaskin · on Dec 15, 2017

It's the main reason I don't have a LinkedIn and never will. They are a scummy company.

laderach · on Dec 14, 2017

This is incredibly important. If you dig deep into why LinkedIn is behaving the way it is, it is definitely not an attempt into protecting users' privacy. It's all about maintaining and expanding the ways it can monetize the data that users provide.

This is the type of thing that we risk loosing as the internet matures and internet companies with vested interests gain more power. Setting this type of precedents will absolutely curtail innovation and freedom in the future. Think about it, would Google have been created in an environment that is overwhelmingly siloed and filled with red tape?

I see parallels to the net neutrality discussion in this.

ynniv · on Dec 14, 2017

Access that does not require authentication should never be a crime. If LinkedIn wants the courts to intervene, they must require authentication for their data. If they also want Google to scrape their site, they must require Googlebot to authenticate itself.

gwbas1c · on Dec 14, 2017

> Access that does not require authentication should never be a crime.

Careful, this could legitimize things like accidental denial of service. Depending on circumstances, even basic scraping could cause problems.

(I need to be vague to avoid violating an NDA.) A major internet site had a URL that went something like somedomain/group?id=xxxxx. It turns out that a simple scraper, that called id=1, id=2, id=3, ect, ect, caused a major problem! This was because rendering these pages required significant resources; so most active pages were kept in RAM. Of course, the scraper tried to read everything.

Of course, no one thought the scraper was malicious in any way!

odorousrex · on Dec 14, 2017

>A major internet site had a URL that went something like somedomain/group?id=xxxxx. It turns out that a simple scraper, that called id=1, id=2, id=3, ect, ect, caused a major problem!

This is a failure on the part of the developers at that "major internet site". Using a guid instead of consecutive IDs, a rate limiter, hell even just a cache...or all of the above. There are lots of solutions here.

You have to take robot scraping and indexing into consideration, and assume people will ignore robots.txt. (Certain bots, i.e. msnbot/bingbot are quite aggressive!)

carterehsmith · on Dec 15, 2017

>> This is a failure on the part of the developers at that "major internet site". Using a guid instead of consecutive IDs, a rate limiter, hell even just a cache...or all of the above. There are lots of solutions here.

You are right, but few organizations are sophisticated.. or wealthy enough to employ all of that. I mean, a couple years ago there was a thing that Google's Docs could be enumerated.

And that's Google, they can obiously afford to get competent people working on that, yet they made a mistake (and who doesn't?).

harpiaharpyja · on Dec 15, 2017

Fair enough. It still shouldn't become a criminal issue.

tomarr · on Dec 15, 2017

>You have to take robot scraping and indexing into consideration, and assume people will ignore robots.txt. (Certain bots, i.e. msnbot/bingbot are quite aggressive!)

Who owns LinkedIn again?

dec0dedab0de · on Dec 14, 2017

No, that is a failure of the developer of the scraper. I am definitely pro scraping, but you have to be a good neighbor.

jstarfish · on Dec 14, 2017

How the hell is the scraper dev supposed to anticipate how poorly-written these particular views are with no backend knowledge? If not an automated scraper, a thundering herd from content gone viral would trigger the same result.

bottled_poe · on Dec 15, 2017

Scraping is not an intended purpose for most websites. Unless the website specifically states that this is an intended function, it is not reasonable to assume so. In fact it may be in violation of the terms and conditions of the given website.

nostrademons · on Dec 15, 2017

If the law assumed that only intended functions are permissible, innovation would be a crime. By definition, innovation is finding new and unforeseen uses for resources.

blowski · on Dec 15, 2017

You both make good points. If you make the law too strict you punish reasonable uses of the website, like scraping a few publicly available pages to help users. If you make it too lenient you permit DOS attacks.

It’s not easy to craft a law that will punish bad behaviour without blocking innovation.

kevin_thibedeau · on Dec 15, 2017

I don't intend people named Steve to access my open site so I can sue all Steve's for their felonious behavior?

davvolun · on Dec 15, 2017

I've done some scraping work -- one of my rules of thumb is to always assume the worst of their site and try to be as gentle as possible.

r3bl · on Dec 14, 2017

Oh come on, you're trying to scrape the data out of a black box. You have no idea what their infrastructure is like, and for your purposes, you don't really care.

Of course, some sense is more than welcome, but if my scraper makes one request every 2 sec knocks down your server, it's your fault, not mine.

cmiles74 · on Dec 14, 2017

Weev went to jail for exploiting a similar flaw in AT&T's website[0]. They had a page that, when provided an ICC-ID, would return the matching customer's email address. He supplied a range of valid ICC-IDs and scraped the returned addresses. He was eventually convicted[1].

[0]: https://arstechnica.com/gadgets/2010/06/ipad-3g-user-e-mail-...

[1]: https://www.wired.com/2013/03/att-hacker-gets-3-years/

exolymph · on Dec 14, 2017

And while Weev totally sucks as a person, IMO, it was wrong for him to be convicted in this case. He was punished from AT&T's negligence.

leggomylibro · on Dec 14, 2017

Although, not purposefully exfiltrating loads of data after you've found a vulnerability is like, ethical reporting 101.

Otherwise you get situations like Uber paying out an enormous "bug bounty" totally-not-in-exchange for having their stolen data destroyed. If that person had simply pointed out that they had credentials published in a public repository, how much would they have been paid? Probably somewhere within an order of magnitude of the program's stated maximum payout.

UncleMeat · on Dec 15, 2017

Punk test. Advocacy groups are way less likely to want to turn your case into a test case if you are a racist asshole.

abpavel · on Dec 14, 2017

Are you suggesting that someone should do time for running a script that happen to stumble on one of your bugs?

da_chicken · on Dec 14, 2017

If the activity caused actual damages and was outside the scope of normal usage? Yes.

You're still culpable if your actions break your neighbor's window, even if it was accidentally while you were opening it.

__jal · on Dec 14, 2017

Unless I'm missing something, you're proposing criminal penalties for tort liabilities.

Yes, if my crappy software costs you money by knocking your site offline by accident, I should make you whole.

I think it has to be something substantially more impactful, clearly intentionally malicious, or in some other way much worse than aggressive timeouts before we start thinking criminal penalties.

ouid · on Dec 14, 2017

either I read it wrong the first time as well, or he edited it, but reading it now it clearly says "while opening it" which is a criminal act, in context.

da_chicken · on Dec 15, 2017

No, when I responded it read "be held liable." I must've commented while it was being edited.

alasdair_ · on Dec 14, 2017

Say a business publishes a phone number and they typically get X calls per day.

After doing something that pisses a lot of people off, they start getting 1000X calls per day on the same number, almost all complaints.

This cases actual damages (no "normal" customers can get through) and is also clearly outside the scope of "normal" usage.

Do you think the same rules apply?

da_chicken · on Dec 15, 2017

Yeah, that's not how the comment read when I responded. It said "be held liable," not "do time."

I must've responded while he was editing it, and I didn't catch the change.

sp332 · on Dec 14, 2017

I think you could be sued for damages, but that's not the same as a criminal case.

JepZ · on Dec 15, 2017

Honestly, we all know the wild west mentality of the internet (yes, it is post national as in 'above the law') and therefore, everyone should assume attacks like that and build defenses against them. Building a service which could be brought to a 'major problem' with simple requests leading to high server loads is just negligent. What would that site do if someone actually wanted to attack it?!?

I am not saying that the trouble with the law enforcement in the internet is a neither a good nor a bad thing. Actually, it depends and in the 'real' world I am pretty happy that the law enforcement works quite good where I live. I think the thin line is somewhere where I start to fear my own governments more than the bad guys (while not having any evil intentions or plans at all).

throwanem · on Dec 15, 2017

> the wild west mentality of the internet (yes, it is post national as in 'above the law')

No. It's only been ahead of most laws for a while, as all frontiers are while they remain frontiers. But all frontiers eventually close, and laws catch up with them as they do so. That is what we're seeing now, and have been for a decade or more.

tokenizerrr · on Dec 14, 2017

Honestly that just means the website sucked and it went down because it sucked. Making it not suck is the solution, persecuting the people who stumbled into your suckiness is not.

shkkmo · on Dec 14, 2017

> Careful, this could legitimize things like accidental denial of service.

Are you saying that the writers of a bot that causes accidental issues with a site due to poor development standards on that site should spend years in prison with a federal felony conviction?

tonyarkles · on Dec 17, 2017

A while ago I was introduced to a client whose site "was the target of hackers that were deleting all of the content from the CMS". Here's what I discovered:

- the password verification form to access the admin area did the verification check in JavaScript, not on the backend. So if you have JS disabled and click "Submit" on the Admin login form, you're into the admin area.

- the "delete" button in the admin area was implemented as an <a href=...> that simply did a GET request (violating the idempotent nature of GET requests).

Looking at the logs, it was pretty clear who the "hacker" was: Google. They'd come, follow all of the links, make their way into the admin site, and follow all of the delete content links.

I consider the work that the original developers did to be grossly negligent, and I certainly don't fault Google for anything.

Chickenosaurus · on Dec 14, 2017

Accidental denials of service are indeed a common occurrence. By the way, it's "etc" from latin et cetera - I assume you didn't want to refer to electro-convulsive therapy :-)

grkvlt · on Dec 15, 2017

As any fule kno, this is how Molesworth writes, ect ect ect.

azernik · on Dec 15, 2017

In general, the law is capable of dealing with this kind of issue - it can look at the intent of the owner of the service.

cf. for example the law on trade secrets. If you take "reasonable steps" to safeguard the secret, and impose NDAs on the people you do grant access, then courts will punish competitors who steal them, even if your security happens to suck.

jlg23 · on Dec 14, 2017

> Careful, this could legitimize things like accidental denial of service. Depending on circumstances, even basic scraping could cause problems.

I have to "deal" with that problem every day. Misconfigured scrapers are dealt with by apache as are idiots who try to DoS the site (an intelligent attack still needs manual intervention, though).

mgalka · on Dec 15, 2017

> Access that does not require authentication should never be a crime.

Linkedin is not trying to prevent access. They want to prevent information from being scraped, and then used to their detriment.

feelin_googley · on Dec 14, 2017

Here is an example of the "good bot"/"bad bot" nonsense in action.

This is an article about the LinkedIn v hiQ case at AdWeek.

  curl --user-agent INSERT_ANYTHING_HERE http://www.adweek.com/digital/rami-essaid-distil-networks-guest-post-linkedin-hiq-labs/

It seems AdWeek can distinguish a "good bot" from a "bad bot" irrespective of the behavior of the user^W bot, i.e., whether it is one single HTTP request or 10,000 consecutive requests is irrelevant.

How do they do it?

Pattern match against the User-Agent string.

Effective shibboleth.^W engineering.

Clarification: If a user, not a "bot", makes the "wrong" choice of user-agent string (e.g. in the browser settings), then they will be labeled a "bad bot", even if their behavior is no different than other users who are not labeled "bad bots". For example, they make one HTTP GET request just like any other user. There are databases of "acceptable" user-agent strings available to anyone. If still unsure about the point I am making, see this post from several days ago: https://www.sigbus.info/software-compatibility-and-our-own-u...

alexdoma · on Dec 14, 2017

What would be a better solution, IP address check to allow only known google crawlers perhaps?

andreareina · on Dec 15, 2017

Classify IPs based on their recent behavior[2]. Most bots behave very differently from the median user, along many different dimensions -- volume of requests, time between requests, visit length, which links are followed, etc.

And if this means that bots are altered to become indistinguishable from users, and therefore have a minimal impact on a site's loading? Well, mission accomplished[1].

[1] https://xkcd.com/810/

ETA: [2] Recent behavior (as opposed to all historical behavior) is used so that someone inheriting a "bad" IP isn't completely screwed over.

mark-r · on Dec 15, 2017

That's a superb xkcd that I hadn't seen yet, thanks.

marcosdumay · on Dec 14, 2017

The real solution is disallowing behaviors, instead of shibboleths.

It's surprising that malicious bots aren't exploiting those things already.

ben_w · on Dec 15, 2017

That practically invites them to present a different page to google as to a normal user, the former pure SEO, the latter perhaps pure advertising.

fjsolwmv · on Dec 16, 2017

And Google will happily deindex the site as soon as they find out

ben_w · on Dec 17, 2017

Raising an interesting question: can a website owner (use the law to) ban google from accessing their website by any mechanism other than their crawler in order that google doesn’t find out?

Sure, obviously limited utility just like “the right to be forgotten”s flaw of diffing USA internet from EU internet to find specifically what people want forgotten, but shenanigans interest me.

someonewithpc · on Dec 15, 2017

A related story is that windows 9 isn't a thing because software used to check for windows 95 and 98 by matching the name to "windows 9".

ikeboy · on Dec 14, 2017

>good bots

You mean, bots that obey robots.txt?

https://www.linkedin.com/robots.txt very specifically prohibits scraping by any bot besides a small whitelist.

robots.txt compliance is not difficult to build. I'm fine with robots.txt violations being considered hacking.

diggan · on Dec 14, 2017

> robots.txt violations being considered hacking

Hm, I disagree. Either information is public, no matter for who. Or the information is private, and you should have ACL for accessing the information. I don't think it's fair to say that information is public if you're a human but private if you're a machine, or vice versa.

It's not about if it's difficult to build but rather the principle behind if you can just allow humans to read something.

ikeboy · on Dec 14, 2017

Why is discriminating against robots unfair? There are valid reasons (for instance, robots take a lot of resources to serve and don't lead to revenue).

diggan · on Dec 14, 2017

Just because it's a robot doesn't mean that it takes more resources to load a page. A robot that loads 1000x more pages than a normal user, sure. But then rate-limit everyone rather than blocking specifically bots.

And that bots don't lead to revenue depends on why the bot is navigating on your page no? If it's some indexer that links back to your website and it's a popular index, then you'll maybe end up with more revenue thanks to that bot than a normal user.

ikeboy · on Dec 14, 2017

Accepting robots + humans takes more resources than only accepting humans.

Your arguments about revenue are website-dependant and it's the website owner who is in the best position to decide whether robots are good for them or not (and plenty of sites don't ban bots in their robots.txt). In this case, the company that ran the bots is directly competing with Linkedin's products that sell aggregated data to employers and such, and linkedin clearly decided it's not going to lead to more revenue for them.

ouid · on Dec 14, 2017

my browser is a robot that renders your page.

marcosdumay · on Dec 14, 2017

What is exactly the difference between a robot and a person using a browser?

Does an ad-blocking browser counts as a bot or as a human? And what is something that concatenates all of your infinite scrolling to represent a paginated view? What is something that changes the structure of your page? What is something that concatenates different pages before displaying?

rtpg · on Dec 15, 2017

The real life equivalent of this is "if I leave my door unlocked, should someone be allowed to walk in anyways?"

I would definitely want some intent provisions in, but saying something is accessible therefore free game seems too wide.

MereInterest · on Dec 15, 2017

> The real life equivalent of this is "if I leave my door unlocked, should someone be allowed to walk in anyways?"

The problem with analogies is that many equally valid analogies that can be made, but with many different points. I would argue that the real life equivalent is "Have this free book, but you may not read Chapter 4."

rtpg · on Dec 15, 2017

Well I suppose both are possible, and it's on a case by case basis.

If I put on my website terms of service "please don't try to go everywhere" , and then you do... seems like you did _something_.

I don't really get what sort of stuff is enforceable, though.

scooble · on Dec 16, 2017

Or putting up a poster that only some people are allowed to look at (or that the google maps car isn't allowed to photograph).

chii · on Dec 15, 2017

> ACL for accessing the information

the ACL is the robots.txt. A door with or without a lock doesn't determine whether the place is public or not.

hhh · on Dec 15, 2017

if my bot is actually my cat actuating a switch for it to load a page, does it have to follow robots.txt?

joelanford · on Dec 15, 2017

robots.txt is more like a sign that asks certain people not to look at a bunch of other publicly visible signs.

One can't post a sign in public that tells people not to look at other publicly visible signs and expect the government to arrest or fine them for ignoring it.

lloeki · on Dec 15, 2017

robot != UA

What if I user curl to pipe web content to my mail so that I can read it in a quirky way? What if I write a Chrome extension to crawl a site? Where does w3m stands?

This is not a question of the tool (UA) but of the intent (mass crawling, indexing, mass-replicating stuff). robots.txt is made as hints for crawlers and the like, not optimistically ACL whether something is public or not.

Dylan16807 · on Dec 15, 2017

robots.txt cannot change whether something is public, because it doesn't apply to humans.

martin-adams · on Dec 14, 2017

For the most part I agree, but I feel there are grey areas. Things like web browsers (which are not robots) can access the content as though they are from a human. But what about extensions or apps that do things in the background, such as caching the contents of several pages for offline viewing. Is that now considered a bot.

The robotstxt.org site states that a robot "should" obey the rules. "should" is not a legal term that implies compliance. "must" would have been more appropriate to indicate enforcement.

_dcwr · on Dec 14, 2017

That file includes at least two non-standard syntax extensions[0]. Robots is just a de facto standard and respect of some directives varies[1]. So much for it being 'not difficult' while the task is not even clear because there isn't even a clear standard.

Archive.org also dislikes how robots.txt is being used mainly for search engines and goes against their mission in particular[2]. Are they now hackers for not throwing away information just because someone was overzealous with robots.txt or retired a certain website and uses robots.txt as SEO to let another one take its place in Google search results?

If some big corp wants to cry and bring legal matters into software they should first be accountable themselves for not securing themselves and the data of their clients (see the LinkedIn hack people mentioned elsewhere here and in general the high profile hacks like Equifax, Sony, etc.). Or should software shape up to be like many other areas today are - multi-million corporations are free to play fast and loose and endanger people while small guys get fried over meaningless bullshit and vaguely defined "crimes".

[0] - https://en.wikipedia.org/wiki/Robots_exclusion_standard#Nons...

[1] - https://intoli.com/blog/analyzing-one-million-robots-txt-fil...

[2] - https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

ikeboy · on Dec 14, 2017

It contains

User-agent: * Disallow: /

I am pretty sure none of the standard libraries/ tools that respect robots.txt would continue after being fed that file.

>throwing away information

This is entirely irrelevant. If they receive data from someone they have no obligation to discard it because of the current status of robots.txt. The question would be if they should continue to actively scrape that website.

It seems like they've done that for gov sites, but nobody particularly cares about enforcing gov robots.txt. It would've been interesting if the government sued them, although if they cared they probably would've just told them to stop.

_dcwr · on Dec 14, 2017

So we have an unclear "standard" that is only a de facto standard (and still varies in more advances directives between few big bots) that you're "pretty sure" about but that's seemingly not written down in its entirety anywhere and it'd also be enforced selectively depending on whether or not "someone particularly cares". Truly perfect and foolproof law that would be.

And all this to protect some corp's business model of not letting others collect automatically the public information they provide, while they are free to use outdated or buggy software, store passwords in plaintext, etc. and get away with leaking data of millions of customers that should never be public.

And it'd fail to stop anyone except benign, private and low fund actors because instantly Indian (or other low wage country) services for "scraping by human thus not a bot ignoring robots.txt" would pop up, just like there are captcha solving services that employ humans already, and malicious bots wouldn't care anyway just like they make 0 effort to respect it now and run from servers in some country that isn't friendly towards USA so there is 0 potential for catching the perpetrators.

ikeboy · on Dec 14, 2017

I disagree that laws that can only be enforced against US companies / people are worthless.

Requiring a human would increase costs and it doesn't seem like a good argument against anything.

_dcwr · on Dec 14, 2017

But they are in this case. They would not stop any scraped data from popping up for sale in shady places. That can be done by LinkedIn or whoever themselves using some smart way to detect bots and stop them from scraping their website.

The only people a robots.txt law would affect are private users who set up a Python script to scrape a single page for themselves to check for something, things like archive.org, researchers, automated website testers, etc. while anyone nefarious can just rent a shady VPN or use a server in Russia, China, Middle East, etc.

Requiring a human barely increases the cost if that data is so valuable in the first place and would be last resort anyway, far after just running the bots from a shady country, for captcha it's done because it's technically easier/cheaper (although supposedly automated solvers exist too).

But laws that punish outright gross negligence would help protect everyone who uses these American websites (and most of the world does) from data leaks of data that is arguably way more sensitive (emails, unhashed passwords, SS and CC numbers, real names even like in Ashley Madison case, etc.).

LinkedIn used sha1 with no salt as recently as 2012 (when they were hacked) for passwords and over 100 million such username + password combinations got stolen. Not only is sha1 not good enough for passwords but for many common and simple words (yes, yes, they are bad passwords, but people do use them) just googling can "crack" them due to lack of salt. The law should either go both ways or neither.

To suggest such heavy handed laws like considering robots.txt ignorance hacking while multi million corporations with millions of users get away with stuff like that (and I mean true negligence of most basic practices, not some obscure bug in the underlying software or something else that isn't absolutely obvious) over and over and over again that every random free my-first-login-page and my-first-SQL-injection-prevention tutorials advise against is absolutely ridiculous and anti-consumer.

ThrustVectoring · on Dec 14, 2017

>robots.txt compliance is not difficult to build. I'm fine with robots.txt violations being considered hacking.

I'm not. You can set up a server to serve different versions of robots.txt to different folks. A malicious actor could deliberately feed inputs to a specific crawler that convince it to violate the terms of the robots.txt it serves to everyone else, and then press for criminal charges against the operator of the scraper.

In a sufficiently adversarial relationship, this lets website owners turn any well-behaved site scraper into criminal activity. That's not a power we want to grant.

alasdair_ · on Dec 14, 2017

>I'm fine with robots.txt violations being considered hacking.

Okay. Start with something simple then - how would you define a "bot" and thus subject to your robots.txt rule?

Is my web-browser a bot? What about a proxy? What about a deaf persons screen reader?

If my web-browser pre-fetches links near my mouse pointer, is that a bot? What if it downloads the whole of an article split over, say, ten pages?

I think of robots.txt similar to posting a "No Trespassing" sign. For a private residence, it's almost not even required, yet for something like a shopping mall during opening hours, the default assumption is that anyone is allowed to be there without a specific invitation, until they are expressly asked to leave and not come back.

Trying to nail down the exact line is a tough issue.

rhema · on Dec 14, 2017

I don't know. The pathological case could include a rapidly changing robots.txt. Think about archive.org's policy. If they suddenly find new restrictions on a domain, they hide it in their waybackmachine. Sometimes an old site will go down and be replaced by totally new owners. This breaks some domains of the waybackmachine retroactively.

ikeboy · on Dec 14, 2017

I think judges are able to ask some questions and tell the difference between an honest mistake and a flagrant disregard for robots.txt, if that were to be the legal standard.

cmiles74 · on Dec 14, 2017

Honoring the robots.txt file is voluntary and ignoring it should in no way be considered hacking. I would go so far as to say that any activity that someone could engage in, simply by loading a URL, should in no way be considered hacking.

Not only does it make it way too easy to prosecute software developers, it really devalues the term "hacking".

dec0dedab0de · on Dec 14, 2017

Sometimes you can do SQL injections just by loading a URL

cmiles74 · on Dec 14, 2017

Perhaps that shouldn't be construed as hacking either. If I send a link to someone via email, they shouldn't need to worry about breaking the law if they click it.

I do think that a company who has been victim of a SQL injection attack will have a better chance in court then, say, LinkedIn in this specific case. At least this theoretical company has made some small effort to protect their data, however inept.

mcguire · on Dec 14, 2017

OTOH, if HiQ employed a team of people to surf to Linkedin and physically type the information into their databases, that would be ok?

bambax · on Dec 14, 2017

> I'm fine with robots.txt violations being considered hacking

Really?? That would mean private corporations, or private citizens, can write laws.

ikeboy · on Dec 14, 2017

You can put up a "no trespassing sign" on your property (although there's some debate as to how much that actually counts for - a quick search pulls up https://www.washingtonpost.com/news/volokh-conspiracy/wp/201...)

jstarfish · on Dec 14, 2017

Robots.txt is not a 'no-trespassing' sign. Robots.txt is a 'whites-only' sign.

The information is available to the public, just not for certain classes. This is and should be legally unenforceable.

If something is truly meant to be private it should not be referenced from a public-facing page or it should have access control enabled.

RcouF1uZ4gsC · on Dec 14, 2017

Robots.txt is more like a "No trucks allowed on street" sign. It allows uses that are typically associated with individuals (viewing a web page, being in a car), while disallowing things that are normally associated with business (web scraping, driving a truck).

ikeboy · on Dec 14, 2017

Except race /skin color is a legally protected class, and robots aren't (and why should they be? They can't enter into contracts, conduct business, etc. So it's perfectly legitimate to exclude them from a site where they cannot use it in the intended manner).

"If you truly didn't want trespassers you should've put up a gate."

jstarfish · on Dec 14, 2017

Bots being legally protected as a class or not, using robots.txt as the ultimate test of what distinguishes normal traffic from CFAA violations is a very flawed mechanism. It turns your website into a minefield.

As a property owner, a no-trespassing sign won't protect you from the lawsuits that result when a toddler drowns in your pool. You're expected to do more (like putting up that gate).

Equifax's systems are peppered with "no-trespassing" motds at login. They also have a robots.txt file. We expected them to do more.

Same for leaving keys in your ignition, guns unlocked on your nightstand, etc. "Don't touch" signs won't absolve you of responsibility when either gets stolen and used in a spree killing.

So yes, as the owner of any sort of asset, in most contexts it is your responsibility to implement access controls to keep unauthorized traffic out.

grkvlt · on Dec 15, 2017

> won't protect you from the lawsuits that result when a toddler drowns in your pool

Good analogy. I wonder i operating fa poorly secured website that leaks private information could be seen as an 'Attractive Nuisance' [0] and the owners could be prosecuted for that, rather than the hackers!

0. https://en.wikipedia.org/wiki/Attractive_nuisance_doctrine

33W · on Dec 14, 2017

Maybe it's closer to a No Trespassing sign, written in a language that only certain classes will understand.

Frqy3 · on Dec 14, 2017

In this case it is a "white bots only" sign, as it allows some bots but wants to block the rest.

Even for those who think that robots.txt should be enforceable, allowing some bots but not others makes it difficult for a new player to have the same equitable access to information as the big players.

bdamm · on Dec 14, 2017

They can anyway. That's what contracts are.

cabaalis · on Dec 14, 2017

I was about to agree that robots.txt prohibitions should be considered a form of authorization.

But I think what is being argued is that "if it's publicly available on a URL, it's available for any client to download and use." I think the latter argument holds more water, as since they are making it publicly available it is implicit authorization.

hardtke · on Dec 15, 2017

If robots.txt allows Google and Bing but nobody else, it should be ignored. If it blocks everyone, then I agree. We need to make sure that the next Google has a chance to succeed.

grkvlt · on Dec 15, 2017

Interesting. They say that crawling is prohibited there, actually, and have a blanked 'Disallow' at the end.

    # Notice: The use of robots or other automated means to access LinkedIn without
    # the express permission of LinkedIn is strictly prohibited.
    ...
    User-agent: *
    Disallow: /

All the listed bots are only able to access a small subset of pages, the same for each bot apart from one. The 'deepcrawl' bot is privileged, and gets to see the '/profinder' pages, for some reason?

    # Profinder only for deepcrawl
    Allow: /profinder*

Anyone know who operates this bot?

anilgulecha · on Dec 14, 2017

robots.txt have no legal validity.

ikeboy · on Dec 14, 2017

I mean, it seems to have been cited in the lawsuit. See e.g. https://static1.squarespace.com/static/5803b57737c581885cbd0... and search for it.

dognotdog · on Dec 14, 2017

I doubt a bot could legally agree to a license put into robots.txt, even if it were able to make sense of it, and a human is never expected to read it.

The purpose of robots.txt is to guide bots away from circular links and such that would result in bogging down the site and causing undue amounts of nonsense traffic.

The purpose of robots.txt not access control.

EDIT: typo fix

ikeboy · on Dec 14, 2017

The human is expected to use it to not scrape sites that prohibit it.

scoot · on Dec 14, 2017

Although it appears the court found for HiQ (against LinkedIn): https://regmedia.co.uk/2017/08/14/hiqlinkedintro.pdf

ikeboy · on Dec 14, 2017

Temporary injunction, not final decision.

decasteve · on Dec 15, 2017

Do Not Track compliance is even easier to build. Does the same logic apply?

TheCoelacanth · on Dec 15, 2017

Yes, this is an extremely good point. If failing to follow robots.txt is a criminal violation of CFAA, then using any of my computers resources (cookies, javascript, etc) to track me while I am sending a DNT header is also a criminal violation of CFAA.

I would almost be willing to concede making not following robots.txt a violation of CFAA if the trade-off was Mark Zuckerberg being brought up on several billion felony charges every year.

paulus_magnus2 · on Dec 14, 2017

How about: if you want me not to scrape it, keep it off my internet??

Actually I'm considering building "API-fication" of websites with bindings for major languages (Java, Python, JS). With luck websites could participate by providing & maintaining a parseable API-sitemap.

This would open door to my 2nd project: orchestration a-la BPEL on top of websites. visual editor, macros, scripting. Call this PIPES 2.0

_mhr_ · on Dec 15, 2017

Can you provide some use-cases for why this would be useful in a way that wouldn't violate most sites' ToS?

paulus_magnus2 · on Dec 15, 2017

- a lot of online stores, hotels need to constantly update prices based on what competitors do.

- cleaning (big) data. Automatically reconcile data to canonical format / names using authoritative source (say wikipedia)

Can you understand even the simplest TOS? I'd argue most (all?) are too restrictive to be enforceable. https://tosdr.org/

Analemma_ · on Dec 14, 2017

I mentioned this before in a previous thread on this topic, but I can't support the EFF on this. This is, at the end, an argument against control over ones own data: LinkedIn might be doing sketchy things with your data, but it's all stuff you voluntarily agreed to in exchange for their service. If any shady data aggregator can vacuum it up and do whatever, I didn't consent to that and I'm not getting any benefit from it. The EFF shouldn't be defending that right.

bo1024 · on Dec 14, 2017

But the EFF isn't arguing that any shady aggregator should be able to vacuum up anything. LinkedIn would still have the full right and ability to implement limits, blocks, or so on to prevent this. LinkedIn could still make it against their terms of service and pursue a civil suit. It just would stop LinkedIn from being able to pursue felony hacking prosecutions against people for accessing a public webpage with a script.

guywaffle · on Dec 14, 2017

Make it fair then! Bots can’t scrape LinkedIn, and LinkedIn can’t sell any consumer data to third parties.

JepZ · on Dec 15, 2017

For real: I really hate corporations 'stealing' data from my phone. For example Google likes to introduce new sync options to Android and every time they do so it is activated by default. So as soon as the update arrives their software syncs my data to their servers without my consent. They probably have some clause in the EULA but as a user of their products I really hate that behavior. A similar case is not being able to disable address book sync before it syncs the for the first time.

Those things should be crimes as the data they fetch is not publicly available on some web page but exists only on my personal device and they take it without my consent.

kbart · on Dec 15, 2017

Install a firewall (for example, NoRoot Firewall) and whitelist only these apps/services you want to access Internet.

UncleEntity · on Dec 14, 2017

How does a website put reasonable limits on access?

I'm not saying what Linkedin is trying to do is right but it seems to me there needs to be a way to say "Dude, that's not cool." A regular B&M store can refuse service to disruptive people and trespass people who don't comply, why not servers?

--edit--

Pretty much what rayiner is saying, they posted while I was typing.

jimktrains2 · on Dec 14, 2017

> How does a website put reasonable limits on access?

1) Blocking TCP connections

2) Returning a 4XX error, perhaps even "401 Authorization Required", "402 Payment Required", "403 Forbidden", or "429 Too Many Requests"

> A regular B&M store can refuse service to disruptive people and trespass people who don't comply, why not servers?

A Brick and Mortar store has to _tell_ you you're being banned. The mechanisms I listed above both tell you and lock the door whenever you attempt to access.

Edit: In this case, it's more like someone was looking in the store window from the public sidewalk and asked to stop. Can you really ask someone to stop looking at you from a public place?

amyjess · on Dec 14, 2017

LinkedIn sent HiQ a C&D. They were indeed told that they were banned.

Let's try a thought experiment: you're at a supermarket, and you're abusing coupons to the point where you're holding up the line for everyone. Someone complains to the manager, and the manager escorts you out of the store and tells you you're banned for life (as an aside, I wish this would happen to extreme couponers).

The supermarket also has automatic doors and a self-checkout. They're also pretty understaffed, so there's a good chance you won't run into anyone stocking the shelves as you're shopping. A few days after you've been banned, you waltz in through the automatic doors, grab some items off the nearest shelf, go through the self-checkout, and leave without a single employee getting a good look at your face. At the end of the day, the manager starts fast-forwarding though the day's security camera footage looking for anything odd and notices you've been in the store. They call the police and have you charged with trespassing.

Do they have a case, yes or no?

I say yes.

jimktrains2 · on Dec 14, 2017

Because of the minimal amount of LinkedIn resources utilized and this not apply to all robots/extreme couponers, wouldn't this be more like a competitor checking your weekly ads posted on your front window?

Walking into a store is a clear violation of private space. Is looking at their window?

So, if you had to have an account to view any linked in information, and you got a c&d and your account banned, and you sign up for a new account, I think it would be like entering a store you've been banned from. But we're talking about information available from a public space: on your window or without an account.

I also take issue with the CFAA being used here. I'm sure there are other laws more applicable to keeping someone from talking with you.

To recap: I don't think LinkedIn is wrong to ask them to stop, I just don't think they're using the appropriate means of forcing them to.

UncleEntity · on Dec 14, 2017

> In this case, it's more like someone was looking in the store window from the public sidewalk and asked to stop.

I think it's more like calling the store and asking them what their prices are 20 times a minute.

mcguire · on Dec 14, 2017

No, it's more like you holding the giraffe while I fill the bathtub with brightly painted power tools. Because reasoning by analogy sucks.

No one is accusing HiQ of performing a denial of service attack.

UncleEntity · on Dec 14, 2017

> ...you holding the giraffe while I fill the bathtub with brightly painted power tools.

I'm down for that.

jimktrains2 · on Dec 14, 2017

...and the coffee shop doesn't block your number.

Also, a phone call consumes, as a percentage of available resources, vastly more than an HTTP request.

Disregarding that though, I think you'd need a court order telling someone not to talk to you, and you'd have to take action to prevent them as well, blocking their number and tell them to stop before that would be granted. If they persisted after being told explicitly and having their number blocked, then yes, I do think legal action would occure and be swift.

I would also assume, presumably, that "you" can be extended to be an automated phone system. (Which is still more limited in capacity than a server would be, but even disregarding that.)

FWIW, I'm not saying that "hiQ Labs" is blameless or acting in good faith. I'm saying that unimpeded access to publicly accessible information requires more than asking someone to stop and that the CFAA isn't the right tool for this.

I'm not an expert in this field, but I doubt the vast majority of anyone in this thread is. It also becomes interesting because I believe the CFAA has been used in similar situations before, but those were where the accessed knowlege could be assumed to be private, even if made public (client details at a phone company, or articles known to be behind a paywall) (and not that I agree with its usage there either, but the data accessed there could be assumed, by a reasonable person, to not be public).

So the key thing here is: if something is publicly available, can I ask you to stop looking at it, or do I need a more stringent court order to prevent you from viewing public information?

And in this case, I do think the capacity constraints disregarded above would come into play. I think the courts would look differently at someone calling your clerk 20 times a day vs looking at a menu you post on the window.

UncleEntity · on Dec 14, 2017

> ...I think you'd need a court order telling someone not to talk to you, and you'd have to take action to prevent them as well, blocking their number and tell them to stop before that would be granted.

Like, for example, sending a C&D letter?

This whole hubbub is over them sending a C&D, they just made the mistake of trying to use the CFAA as a means to enforce it -- which, honestly, hiQ is fighting the good fight trying to stop.

jimktrains2 · on Dec 14, 2017

A c&d is not a court order. It is a not-so-polite request and warning that further action will be taken.

Edit: If that's your point I agree with you. C&d followed by some more appropriate (than the cfaa) seems like a not-raise-everyones-backs approach.

mark-r · on Dec 15, 2017

That raises a point - would hiQ be liable in a civil suit if the CFAA were not a factor?

oh_sigh · on Dec 14, 2017

Looking in a window from a public place doesn't use any resources of the company being looked-upon.

TheDong · on Dec 14, 2017

This legal complaint is not about resources used; it's not a "They criminally DDoSes us".

This lawsuit is an attempt to stop competition by curbing access to data, not about ensuring reasonable use of apis and rate limits.

cmiles74 · on Dec 14, 2017

They have many options. They can rate limit access by IP address, they can keep information they'd like not to be scraped behind login screens. And so on.

mirimir · on Dec 14, 2017

They could add requisite code.

Miner49er · on Dec 14, 2017

weev went to jail for accessing publicly available information from AT&T. There's not a great precedent here for the EFF, unfortunately.

icebraining · on Dec 14, 2017

It was only a jury decision by a lower court, it doesn't mean much in terms of precedent.

k3a · on Dec 16, 2017

I think scraping for personal use (not honorig robots.txt) should always be legal unless you are attempting DOS. You are accessing public information, the server is returning HTTP200 and it doesn't matter if you do so using a browser, phantomjs or curl with -A parameter.

A different situation would be scraping a website to make business. Worst being directly using the data - for example those StackOverflow clones with original data doesn't sound ok to me. I am not sure what to think about bots doing various derived work like stats and analysis. I think that if they are part of a business, making money, it shouldn't be legal unless those request are permitted by robots.txt.

euske · on Dec 15, 2017

Question. How this principle can coexist with the idea of "surveillance is bad"? Because that's mostly to collect publicly available information. Is it bad because it's done by a government? It's possible to set up a bunch of privately owned cameras in a city and keep filming people. Is it the association of infos that makes it bad and not mere collection? Is it okay if it doesn't have a personally identifiable information (but who knows what one can make out of them)? I don't know what I should think of this.

Semiapies · on Dec 15, 2017

This thought process always bewilders me. Whenever it comes up that government agencies monitor our emails and phone calls, someone, as if on cue, always pipes up that that's totally no different from people posting on their Facebook timeline and other absolutely mind-bogglingly bad equivalences.

You, however, go the extra mile, here. How about you explain exactly how accessing published information on a public website is like building a network of cameras to monitor a city with?

Dylan16807 · on Dec 15, 2017

Surveillance is bad but it is also not hacking.

Boom. Easy to have both opinions.

I would love to limit corporate databases, but not via letting website owners declare arbitrary use to be criminal.

misterhtmlcss · on Dec 15, 2017

Can data that is supplied with an intention to be publicly accessible i.e. public domain be restricted. If the public was asked, "When you supplied your picture, your name, and then created a public URL to become fully searchable, was your intention that that information was to be restricted or was your intention that this was information you publicized about yourself to make it possible for potential employers to find you?". Answer, "Yes, it was 100% my intention to become searchable so that employers would be able to seek me out". Conversation is over.

LinkedIn creates an implied covenant with public consent (mostly) to then publish and make discoverable their professional profiles.

While LinkedIn 100% should have the right to stop others from embedding without permission since it's possible to claim the data structure and presentation is proprietary to them, this should never extend to the actual data itself, since this was willing gifted by the actual owners (Joe public) into public domain.

I think an argument could be made that LinkedIn is being burdened with a degree of data mining that affects their business and therefore should be able to charge a minimal fee e.g. an API firehose to acquire the data in bulk from providers in an raw data stream.

That seems reasonable depending on the charges associated with that offer, this would be the correct compromise, since their data structure is all that actually separates their service from say About.me or any other site of that type. All of which don't disallow scraping; as long as it doesn't present as a DOS attack (of course).

Anyway my comments are as a marketer and not a programmer or lawyer, but personally I'm very interested to see this case resolved in a manner that doesn't suit LinkedIn in the slightest.

tptacek · on Dec 14, 2017

Are they arguing that it's a crime, or that it's a tort?

metallah · on Dec 14, 2017

I believe the latter (though IANAL)

rayiner · on Dec 14, 2017

There is a difference between public property and private property that is made available to the public. Just because the cafe on the corner has its door open and lets you stroll in off the street doesn't mean that the property owner doesn't retain the right to exclude people. And if the property owner revokes your permission, then going onto the property again can be a crime (trespass).[1]

Servers are no different. The Internet isn't an abstraction--it's just pieces of private property connected together (servers, routers, switches). When you make an HTTP request, you're accessing a piece of private property. The owner of that property has every right to decide not to let you do so.

cybwraith · on Dec 14, 2017

That's not a great analogy. The store owner can't just get your arrested/charged with a crime if they don't tell you that you aren't allowed first. Http lacks such a human mechanism. The closest thing I can think of in the standard is the response code. So your server replying 200 OK should implicitly be considered permission to access that resource legally until it stops replying with that code.

rayiner · on Dec 14, 2017

But that's exactly what happened here:

> LinkedIn sent hiQ cease and desist letters warning that any future access of its website, even the public portions, were “without permission and without authorization” and thus violations of the CFAA.

The EFF's point about terms of service is a good one, but also irrelevant. Terms of service don't provide adequate notice that someone's implied license to access a website has been terminated. But here, hiQ had actual notice through "human" channels.

cmiles74 · on Dec 14, 2017

The poster is arguing that if you make a request from LinkedIn's website and it returns a "200" along with data, then you've accessed that data lawfully and LinkedIn has agreed to serve it to you; I tend to agree. If they don't want to provide data to hiQ, they should, well, stop providing data to hiQ.

There are many ways to do this short of claiming that hiQ doesn't have permission or authorization, an argument strikes me as wholly without merit. If the data is publicly available on the internet then how is permission or authorization required?

rblatz · on Dec 14, 2017

How is that any different than walking up to a store entrance with automatic doors and a sign that says "Welcome" on it?

cmiles74 · on Dec 14, 2017

Well, for one it's not a physical store nor a physical entrance and there is no sign that says "Welcome". I don't think the analogy is helping to make anything more clear... It's possible it's making things more confusing.

In my opinion, the bottom line is that if LinkedIn doesn't want to serve data to this company, then they should immediately cease doing so using the many well established means available to them.

For LinkedIn to claim that following a URL and downloading the data is somehow "hacking their website" is entirely ludicrous. I understand they had a lawyer tell this company that they didn't want them to visit the URL, but I don't see how that somehow turns lawful web browsing into illegal hacking.

mrguyorama · on Dec 14, 2017

Those doors get turned off at night, just like a server can ignore an HTTP request

rblatz · on Dec 14, 2017

They can turn the servers off at night too. Some places still choose to do that. But that is unrelated to the point, if you are told that you are no longer welcome at a business, you can’t come in without it being considered trespassing. The doors automatically opening for you (200 Ok) doesn’t matter. If you wear a disguise (change ip) doesn’t matter. You can’t go in.

Also I would agree that absent a specific order to stop accessing publiclly available server resources, there is an explicit permission to do so. So I’m the case of Weev I think he did nothing wrong, AT&T were the ones in the wrong.

yorwba · on Dec 14, 2017

The store owner could have told you that are not welcome at any time of day. I don't think a generic "Welcome" sign or automatic door would override that.

jimktrains2 · on Dec 14, 2017

In the coffee shop example, would this be like trying to sue someone who is banned from your shop from looking in the window at your price list? In this case, it's more like LinkedIn is attempting to get a PFA order, but I think they need to show abuse, not just looking in the window at the menu you posted on the window?

rayiner · on Dec 14, 2017

No because that's not how computers work. Computers don't just emit radiation into the aether that anyone can capture. Accessing a website involves making a physical piece of property do something in response to your HTTP request.

btown · on Dec 14, 2017

If you are notified in writing that you're banned from a coffee shop, but you walk up to the front door and the "server" (pun intended) greets you warmly and allows you to enter, is that "implied consent" that overrides the prior explicit anti-consent, and therefore undermines the legal authority of that ban?

rayiner · on Dec 14, 2017

I think almost any judge or jury would find it implausible if you told them you thought the written ban didn't apply anymore because the server still let you into the coffee shop. We intuitively understand that written notice from a property owner carriers more weight than the actions of one of their workers. I think the same exact reasoning applies where the "worker" is a computer server.

amyjess · on Dec 14, 2017

I agree with you, but I think an even better analogy would be a supermarket with automatic doors.

If someone was walked out of a supermarket and explicitly told that they were banned for life, and they tried to claim that the ban was lifted because the automatic doors opened for them, they'd be laughed out of court.

You could extend that further and say that the supermarket has a self-checkout. You may very well be able to walk through the automatic doors, grab something off the shelf, check it out yourself, and leave without anyone noticing you, but it's still trespassing if you've been banned from the store.

jimktrains2 · on Dec 14, 2017

I actually think it would be a violation. There is a clear delineation between the private space of the coffee shop and the public space.

It becomes less clear where that delineation is not clear: a menu posted on a window or an automated phone system. These are both private things intended for at-large public consumption. My impression is that the EFF and hiq labs is taking the stance that it's a menu placed in the window, not being let in after being told you can't come in.

jimktrains2 · on Dec 14, 2017

So, I can't shine a flashlight in your store window to look at the menu in the middle of the night? I have to send photons into your "physical piece of property do something".

rayiner · on Dec 14, 2017

I don't think anyone who understands how computers work would compare the active process of a server responding to an HTTP request to the entirely passive phenomenon of shining light into a window and capturing the photons that bounce off.

jimktrains2 · on Dec 14, 2017

I could just as easy say "I don't think anyone who understands how computers work would compare the active process of a server responding to an HTTP request to a coffee shop".

But to respond directly, the paper and tape had to be bought, printed, &c. Capital was expended to place the paper there. Sure there is not the ongoing cost of maintaining this paper in the window, and if that's where your argument lies, then you should be less condescending about it.

Moreover, we're not talking about the costs associated with access, we're talking about the permission granted to access. As such, ignoring the cost of serving an HTTP request is a valid comparison, because it is not at issue here. LinkedIn's argument is just as strong even if their only argument is they denied permission with no reason given.

Thanks for the ad hominem, by the way. Your childishness and inability to conduct a civil discussion has caused this discussion to end.

mcguire · on Dec 14, 2017

But you had to take active steps to cause your physical piece of property to respond to HTTP requests...

jimktrains2 · on Dec 14, 2017

I give the example elsewhere, can I be prohibited from shining a light onto a menu you posted in your front window in the middle of the night? I had to take the active step of turning on the flashlight and sending photons onto a "physical piece of property to" bounce off the menu.

mnw21cam · on Dec 14, 2017

It's a bit like shouting in through the doorway "Hey, how much is your coffee?"

jimktrains2 · on Dec 14, 2017

If you're on public property and yelling, I would assume the coffee shop owner would need a PFA or some other court order to prevent you from access. I don't think they could have you arrested because they simply asked you to not talk to them and they aren't breaking any other laws. (Though asking you not to talk to shop employees would be necessary before the PFA could be granted as I understand it.)

IncRnd · on Dec 14, 2017

It's closer to going into the store that had sent you a C&D, then browsing the racks to create a price list.

jimktrains2 · on Dec 14, 2017

Which they could probably get enforced. Could they prevent you from looking through the window to get prices or just the sale prices being posted in the window?

IncRnd · on Dec 14, 2017

I'm not a lawyer, but I highly doubt that they could. It's also not what happened here.

jimktrains2 · on Dec 14, 2017

Can you expand on the last part? I don't view a publicly accessible webpage as a protected private space, just as I do not view an ad posted in a (private) window as a protected public space.

IncRnd · on Dec 14, 2017

Of course. My statement was predicated on the need for active network requests to obtain information. If the bot had passively listened to network traffic from LI, then I would argue for sameness with passively looking through a window.

jimktrains2 · on Dec 14, 2017

What if it's dark and I shine a light on your ad in the window? (The issue at hand isn't DOS or resource-based, but permission.)

IncRnd · on Dec 14, 2017

I agree with your premise. I'm just reaching a different conclusion.

As a permission issue, the bot _may_ have been authorized and authenticated, however the company was sent a C&D letter that revoked all authorizations. That is why I say that logging in and accessing the resources did not constitute authorizations.

If a C&D letter would not have been sent, I think I'd agree with you.

jimktrains2 · on Dec 14, 2017

You can't prevent me from looking in your window though, at a sign you put up for people to look at none-the-less, with a C&D.

IncRnd · on Dec 14, 2017

Agreed. That's why I made my earlier comment, that this is closer to your entering a store (not just looking in the window) and examininb the merchandise after you already had been sent away for trespassing, revoking all authorization.

cybwraith · on Dec 14, 2017

Again:

> your server replying 200 OK should implicitly be considered permission to access that resource

I do see your point and how you could disagree with my statement above. However, if the store owner forgets you next time and says "Come on in! Oh and here is a take-home menu with all our items and prices" but then calls the police to have you removed, there is a problem.

Now imagine said store owner actually owns several locations possibly even with different public names and doesn't want to serve said customer. They could provide a list of all addresses of stores they run explicitly banning permission. Otherwise, that customer walking into store B would need to be told again they would not be served at time of entry.

Assuming the CFAA C&D from LinkedIn does have legal standing here... If hiQ were using IP addresses and not DNS resolution to crawl, how would they know a particular IP is a LinkedIn resource they aren't allowed to access? Did the C&D provide all addresses they are not permitted to access?

My point is that its not black and white, and certainly not clear that this should be covered by the CFAA under "hacking".

Edit: You could also make the argument and analogy to a restraining order which places the responsibility for compliance on the banned party. However those don't just happen because one entity sends a letter to another entity, it needs to be explicitly granted via the legal process.

rayiner · on Dec 14, 2017

I think the more accurate comparison is that the owner sent you a C&D saying you're banned from the restaurant, and then you try to say "oh, I though the C&D didn't apply any more because the waitress let me in." Would anyone seriously believe that?

The law applies to people, not computers. The only question is: did Linked In convey its revocation of hiQ's implied license in a way a reasonable person would understand? The computer code is only relevant if a reasonable person would take the HTTP status code to take precedence over the C&D letter.

ikeboy · on Dec 14, 2017

robots.txt.

If all requests sent by robots would clearly identify themselves, the server would easily block all of them. But if they fake their user agent to look like a browser and ignore robots.txt, that's not a good faith request and they shouldn't be able to plead ignorance.

cmiles74 · on Dec 14, 2017

I don't believe there's a law requiring the honoring of the robots.txt file. People and services honor the file out of a sense of good manners, not a legal requirement.

ikeboy · on Dec 14, 2017

It doesn't have to be a specific law. It is a rebuttal to a claim of "I had no idea I shouldn't have requested millions of pages from that site".

If you scrape a site that prohibits it in robots.txt, that should be considered notice that they don't want that, for whatever relevant law. (I don't know if this argument would hold up in court, IANAL.)

cmiles74 · on Dec 14, 2017

I think I see what you're saying, but I disagree that the robots.txt file should have any legal ramifications. Web site operators have many tools that they can use to limit traffic or protect data and they should make good use of those tools.

LinkedIn wants to make their data available publicly, except under certain conditions. In my opinion, if they can't find a technical solution, they should stop making the data available publicly.

jimktrains2 · on Dec 14, 2017

What is a robot? Why is the User Agent even important? It's not a standardized value. I could send "User-Agent: ikeboy" and it's perfectly valid.

xoa · on Dec 14, 2017

>Just because the cafe on the corner has its door open and lets you stroll in off the street doesn't mean that the property owner doesn't retain the right to exclude people.

While I don't know about the EFF's overall argument, as an absolute statement I don't think you are correct here. In the USA at least, "Public Accommodations" (which your cafe example would be) are in fact subject to regulations that limit their ability to discriminate, require accommodations for the disabled, etc., and these apply regardless of whether it's public or private property. Something that is open to the general public is different in law then purely private property (private clubs and religious institutions are specifically excluded from federal law, but that's it). There are also going to be different expectations of privacy and default access levels.

Physical to digital analogies are often a poor match anyway, but in this case I'm not sure even if we accept one that it fully supports your point. Private property open to the public is not legally the same as purely private limited access property in terms of who it may exclude, when it may exclude, and why (as well as lots of other standards).

rayiner · on Dec 14, 2017

Laws against discrimination don't turn private property into quasi public property. They are narrow exceptions to the way in which property owners exercise their right to exclude.

Neither the corner cafe nor Linked In can refuse to serve a request by someone because the person is black. But both the corner cafe and Linked In can refuse to service someone for any non-discriminatory reason, such as say because they're a Michigan fan.

ams6110 · on Dec 15, 2017

Or, more reasonably, because they are refusing to comply with some expectations for behavior that apply to everyone there.

choward · on Dec 14, 2017

Are you just trying to play devil's advocate or do you really believe this? With HTTP, you're requesting access and then the server gives you the some information. It's up to the server to decide to give you the information. If the server doesn't give you the info, you can try to hack it but you might be breaking the law.

Same with a cafe. You can request access and the cafe can turn you away or serve you. If they turn you away and you refuse to leave, then you are breaking the law (like hacking).

Basically if I request something from you and you give it to me, that's your problem, not mine.

ynniv · on Dec 14, 2017

Trespass is not illegal until the owner informs you that you are not wanted. Private information that has accidentally been made public is like an unmarked field. It may be private, it may be public, but until the owner takes specific action it is not illegal to use the field. If the owner decides to take action, that action cannot be retroactively applied, even if there is a record of who used the field.

Regardless, this is not analogous. If LinkedIn is making information public, then they cannot simultaneously say that this information is private for a specific use and expect the courts to intervene.

UncleEntity · on Dec 14, 2017

The difference is Linkedin knows they're scraping the site, asked them to stop and is now trying to force them to stop through the courts (in a really bad way).

ynniv · on Dec 14, 2017

Google, Bing, etc are also scraping their site, and I see no cease and desist order there. Make Googlebot authenticate itself, or admit the data is publicly accessible.

UncleEntity · on Dec 14, 2017

The Whataburger I went to for breakfast this morning gives some homeless people free coffee and asks others to leave...

ynniv · on Dec 14, 2017

And when hiQ shows up looking homeless and accepts the gift of coffee, they are committing a crime?

UncleEntity · on Dec 14, 2017

No, but when they ask them to leave and they still take a coffee cup they are.

But that's not the point, the point is it's possible to give something for free and also refuse to give it to everyone under any circumstance.

They didn't give me a free cup of coffee and someone could reasonably mistake me for a homeless person based on my (lack of) fashion sense but that doesn't mean I could just reach over the counter and grab a cup because I saw them give one to somebody else when I walked through the door.

ynniv · on Dec 14, 2017

When a server sends you a response you aren't taking something, you are being given something. If the server thought you shouldn't have it, it wouldn't give it to you.

How can you say that hiQ isn't allowed to have this, but everyone else is allowed to take as much as they like? All that will happen is hiQ will create a string of shell companies that accesses LinkedIn as their proxies, and you will be wasting the court's time. Step zero is to establish that no one can have access unless authorized, and LinkedIn refuses to do this.

UncleEntity · on Dec 14, 2017

> When a server sends you a response you aren't taking something, you are being given something. If the server thought you shouldn't have it, it wouldn't give it to you.

That's not even a rational argument, ask some hacker sitting in prison how well that one went over.

> How can you say that hiQ isn't allowed to have this, but everyone else is allowed to take as much as they like?

Umm, private property? Terms of service? Take your pick...

> All that will happen is hiQ will create a string of shell companies that accesses LinkedIn as their proxies, and you will be wasting the court's time. Step zero is to establish that no one can have access unless authorized, and LinkedIn refuses to do this.

hiQ isn't fighting the validity of giving access to some people while denying them access to the very same data they are fighting a misapplication of a totally unrelated law (because it's the right thing to do).

This whole thing isn't about denying them access but "hiQ challenged LinkedIn’s attempt to use the CFAA as a tool to enforce its terms of use in court."

ynniv · on Dec 14, 2017

"Hacking" involves subverting authentication systems, which is a type of fraud. When there is no authentication system there can be no "hacking", and the CFAA should not be applicable.

The data itself isn't LinkedIn's property (argued elsewhere), so they don't have control over it after it leaves their servers.

This is wandering... please decide whether you want to argue the article, the case, or hypothetical free coffee.

kodablah · on Dec 14, 2017

I think this is a poor analogy. An argument/analogy like yours would allow me to say you trespassed with your eyeballs. That light travels one way or bytes another doesn't affect the spirit that you are looking at something that was made available to look at. Can I outlaw window shopping? My cafe is going to have a sign that says "If you are employed by a competing cafe and you don't close your eyes when walking by, I will attempt to have you jailed". Or I'll wait until they show a price comparison between their coffee and ours, then send a cease and desist to have them not look at my billboard anymore.

_liz2 · on Dec 14, 2017

> When you make an HTTP request, you're accessing a piece of private property. The owner of that property has every right to decide not to let you do so.

It can do exactly that. It can respond with an error code or start dropping packets entirely. As far as I'm aware, LinkedIn didn't do that.

Any access to LinkedIn's data requires that LinkedIn send it in a response. If LinkedIn is sending it in a response, LinkedIn can't claim that it's not authorized.

mcguire · on Dec 14, 2017

I am not a lawyer and I do not have citations to back this up, but I suspect that, if you put up a billboard and then send cease and desist letters to people looking at it, or taking pictures of it, or whatever analogy to programs examining public web pages you like, then you would be laughed out of court.

cmiles74 · on Dec 14, 2017

I find this argument to be a poor fit for the actual situation. The person that owns a coffee shop needs to let people physically enter their coffee shop in order to purchase coffee, snacks, etc. LinkedIn has no such requirement, they can easily require people establish and log into registered accounts in order to access their data. As you have said, their servers are their property and they have the ability to block access for anyone that they do not wish to serve.

This is entirely different. LinkedIn wants to make the data available on the public internet... Except sometimes. They can't figure out a technical solution so they are pushing for a legal solution. If you'd like to try to further your coffee shop argument, this seems more like a coffee shop giving away free coffee with a notice letting customers know that there's a limit of three free coffees per person and then being shocked when some customers take four or five. Or all of them.

huebnerob · on Dec 14, 2017

1. If a business offers me one product for free and I take two, that's theft, plain and simple. I'm not sure what your analogy was meant to prove but I think it actually makes a stronger case for the counterpoint to your argument.

2. LinkedIn has every right to define what the use policy is for information it makes available publicly through its own product. In this case, the policy was violated, and the violator was notified through appropriate channels that they were in violation. They continued to access LinkedIn and violate the policy, which is illegal. The critical distinction is that what they were doing only became illegal when LinkedIn notified them that they were in violation of the policy, no longer welcome on the site, and they continued to do what they were doing anyway.

cmiles74 · on Dec 14, 2017

If you leave things out in an open, public space without any access controls, those things are likely to be taken. A note that says "please don't take this" isn't going to change anything and I find it unlikely that you could pursue anyone legally on the grounds of "but I left a note".

LinkedIn has every right to define their use policy through technical means. If they want to make it publicly available, then they understand part of that public is their competitors. In my opinion, website operators should not get any legal protections for things they can easily do themselves through readily available technical means.

I wholeheartedly disagree that LinkedIn has any right to define the use policy for data it makes publicly available. A wide variety of data is available to the public and you can't simply sue people who use that data in a way that you dislike. If you would like to keep that data private then do so.

Merad · on Dec 14, 2017

> When you make an HTTP request, you're accessing a piece of private property. The owner of that property has every right to decide not to let you do so.

So why don't they do that? If they're responding to a bot's HTTP requests with content, they are choosing to give the bot access.

jimktrains2 · on Dec 14, 2017

Sure, but if you're never told to leave the coffee shop or no action is taken to prevent you from entering again, say being told your banned, and you continue to walk in and use the coffee shop with no one saying anything, has your permission to enter really been revoked, even if the owner thinks, and only thinks, it has been?

jasonlotito · on Dec 14, 2017

FTA: "LinkedIn sent hiQ cease and desist letters warning that any future access of its website, even the public portions, were “without permission and without authorization” and thus violations of the CFAA."

They were formally told to leave the coffee shop and not return.

jimktrains2 · on Dec 14, 2017

No, they were told to stop looking in the window. Can you ask someone not to look at the menu you posted in your sidewalk window? I don't think that'd hold up in court.

jasonlotito · on Dec 14, 2017

Now you are just changing your analogies because you didn't read the article.

confounded · on Dec 14, 2017

> Just because the cafe on the corner has its door open...

Hello, is there a cafe here?

Yes, here’s some coffee! Anyone who asks gets some!

Thanks, I acknowledge receipt!

I’ve changed my mind, I shouldn’t have been giving out coffee! What kind of a business is this? The only way my actions make sense is if you’re a thief. Theif! I will now try to ruin your life via the legal system.

guywaffle · on Dec 14, 2017

Not true. When you make a HTTP REQUEST, you’re not accessing a piece of private property. You are requesting information. Just because it is requested doesn’t mean it has to be served.

interfixus · on Dec 14, 2017

Fine. I hereby forbid access by any entity owned, operated, or otherwise controlled by Microsoft Corporation to any internet server or service operated by me. Disregard of this interdiction shall be considered a crime, the digital equivalent of trespassing.

amyjess · on Dec 14, 2017

If you can demonstrate that Microsoft is aware of their ban from your property, then you absolutely would have a case.

joshuamorton · on Dec 14, 2017

And this is why we have judges and juries.

interfixus · on Dec 14, 2017

Actually, where I live, we don't have juries.