Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The technical side is straightforward but the legal implications of trying passwords to try to scrape content behind authentication could pose a barrier. Using credentials that aren't yours, even if they are publicly known, is (in many jurisdictions) a crime. Doing it at scale as part of a company would be quite risky.




The people in the mad dash to AGI are either driven by religious conviction, or pure nihilism. Nobody doing this seriously considers the law a valid impediment. They justify (earnestly or not) companies doing things like scraping independent artist’s bread and butter work to create commercial services that tank their market with garbage knockoffs by claiming we’re moving into a post-work society. Meanwhile, the US government is moving at a breakneck pace to dismantle the already insufficient safety nets we do have. None of them care. Ethical roadblocks seem to be a solved problem in tech, now.

The legal implications of torrenting giant ebook collections didn't seem to stop them, not sure why this would

The law doesn't directly stop anyone from doing anything, it acts much differently from a technical control. The law provides recourse to people hurt by violations and enables law enforcement action. I suspect Meta has since stopped their torrenting, and may lose the lawsuit they current face. Anyone certainly could log in to any site with credentials that are not their own, but fear of legal action may deter them.

Not criminal law

There is independent enforcement that should apply


Going back to Napster hasn't the gray area always been in downloading versus uploading?

If anyone could show that LLM companies have been uploading torrents then they really would be in trouble. If they are only proven to have downloaded torrents they're walking the line.


> but the legal implications of trying passwords to try to scrape content behind authentication could pose a barrier

If you're doing something alike to cracking then yeah. But if the credentials are right there on the landing page, and visible to the public, it's not really cracking anymore since you already know the right password before you try it, and the website that put up the basic auth is freely sharing the password, so you aren't really bypassing anything, just using the same access methods as everyone else.

Again, if you're stumbling upon basic auth and you try to crack them, I agree it's at least borderline illegal, but this was not the context in the parent comment.


> freely sharing the password

It doesn't have to be so free. It can be shared with the stipulation that it's not used in a bot.

https://www.law.cornell.edu/uscode/text/17/1201

  (a) Violations Regarding Circumvention of Technological Measures.—
    (1)
      (A) No person shall circumvent a technological measure that effectively controls access to a work protected under this title.
This has been used by car manufacturers to deny diagnostic information even though the encryption key needed to decrypt the information is sitting on disk next to the encrypted data. That's since been exempted for vehicle repairs but only because they're vehicle repairs, not because the key was left in plain view.

If you are only authorized to access it under certain conditions, trying to access it outside those conditions is illegal (in the US, minimally). Gaining knowledge of a password does not grant permission to use it.


If I was assigned the task of arguing that in court (though it would be really stupid to assign me, a non-lawyer, that task), I'd probably argue that it's not circumventing a locked door when you use the actual key in the lock; "circumventing" refers to picking the lock. It could still be unauthorized access if you stole the key, but that's a different thing than circumventing, and this law forbids circumventing.

Likewise, if the encryption key is sitting on disk next to the encrypted data, it's not "circumventing" the encryption to use that key. And if you handed me the disk without telling me "Oh, you're only allowed to use certain files on the disk" then it's fair to assume that I'm allowed to use all the files that you put on the disk before handing it to me, therefore not unauthorized access.

That argument might fail depending on what's in the EULA for the car's diagnostic software (which I haven't seen), but I feel it would be worth trying. Especially if you think you can get a sympathetic jury.


To be fair, even ignoring the Robots.txt is illegal in most western countries. I was a technical witness a while back, for a case about a bot ignoring the robots.txt. I said it was akin to a peeping tom ignoring a "no trespassing" sign, creeping into someones backyard, and looking through their window. Yes, they actually did bypass security controls, and therefore illegally "hacked" the site by ignoring it.

Huh, that's interesting, I'm not too familiar with US law, so not surprising I didn't know that :) Time to lookup if it works similarly in my country today, last time I was involved with anything slightly related to it was almost two decades ago, and at that point we (as a company with legal consul) made choices that assumed public info was OK to use, as it was public (paraphrased from memory), but might look differently today.

Thanks for adding the additional context!


How is this different than skipping the password and leaving the same terms of use for the content itself?

Otoh if, as a human, you use a known (even leaked on the website) password to "bypass the security" in order to "gain access to content you're not authorized to see", I think you'd get in trouble. I'd like if the same logic aplied to bots - implement basic (albeit weak) security and only allow access to humans. This way bots have to _hack you_ to read the content

> you use a known (even leaked on the website) password to "bypass the security" in order to "gain access to content you're not authorized to see", I think you'd get in trouble

I agree, but if someone has a website that says "This isn't the real page, go to /real.html and when authentication pops up, enter user:password", then I'd argue that is no longer "gaining access to content you're not authorized to see", the author of the page shared the credentials themselves, and acknowledged they aren't trying to hide anything, just providing a non-typical way of accessing the (for all intents and purposes, public) content.


Sure, it’s a crime for the bots, but it would also be a crime for the ordinary users that you want to access the website.

Or if you make it clear that they’re allowed, I’m not sure you can stop the bots then.


I don't think it'd be illegal for anyone.

The (theoretical) scenario is: There is a website (example.com) that publishes the correct credentials, and tells users to go to example.com/authenticate and put those there.

At no point is a user (or bot) bypassing anything that was meant to stop them, they're following what the website is telling them publicly.


I think this analysis is correct. The part you're missing from my comment is "at scale", which means trying to apply this scraping technique to other sites. As a contract security engineer I've found all kinds of accidentally leaked credentials; knowing if a set of credentials is accidentally leaked or are being intentionally disclosed to the public feels like a human-in-the-loop kind of thing. Getting it wrong, especially when automated at scale, is the context the bot writer needs to consider.

Same goes for human users. The real way to avoid bots is actual login credentials.

There’s hundreds of billions of dollars behind these guys. Not only that, but they also have institutional power backing them. The laws don’t really matter to the worst offenders.

Similar to OPs article, trying to find a technical solution here is very inefficient and just a bandaid. The people running our society are on the whole corrupt and evil. Much simpler (not easier) and more powerful to remove them.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: