Hacker News new | past | comments | ask | show | jobs | submit login
Facebook stored millions of Instagram passwords in plain text (theverge.com)
206 points by mattkevan on April 18, 2019 | hide | past | favorite | 87 comments



Literally every single time Facebook announces something like this, they always announce low numbers and then repeatedly revise them upwards.

Is there no penalty to them for basically lying to everyone in their initial announcement? Doing it once can be explained as they sincerely thought their original number was correct, but they have a history of doing this and there's no chance I believe now that they aren't doing this on purpose.


I think revising upwards is them looking for other bugs of the same type, and finding lots.

I can totally imagine someone on the instagram photo pinch-to-zoom team saying "Whoa - if someone re-logs in while a photo is partially zoomed, our metrics capture the password. Thats bad!". They find thats super-rare, and only impacts 10k users.

Later, other logfiles are grepped for passwords (possibly even by doing a fulltext search across all fields and blobs for common passwords), and more and more are found.

Then someone says "but some of our data is in fields base64 encoded - did you try grepping for base64 encoded plaintext", and even more are found, etc.


You're giving them far too much credit.


Benefit of the doubt.


Benefit of the doubt is something you give to a company the very first time it does something like this. Not something you give the umpteenth time they do something like this, and especially not when they try and hide their "sorry it was waaaay worse" update amid other high-profile news.


> Then someone says "but some of our data is in fields base64 encoded - did you try grepping for base64 encoded plaintext", and even more are found, etc.

This is painfully accurate. I've encountered exactly this kind of thing, including having to repeat search efforts.


This seems like it would happen with most breaches.

You learn about an issue, do some investigation, get an estimate, then inform the public. If Facebook had waited for months to complete the investigation and get these numbers, do you think people would have accepted that? Probably not. So they publish what they know when they think they've got it figured out. But probably this launches up a team that goes looking for other bad logging. Then a while later they find some more errors and the numbers are revised up.

This seems expected.


I can't think of any other company that has a history of downplaying breeches and then revising the numbers upwards. If you don't know how many people are impacted, you aren't supposed to just make up a number. At a bare minimum you'd say something like "We know 10k users are definitely impacted, and we estimate this could be as high as 5M" or whatever the appropriate ballpark is.


LinkedIn went from 6 million to 107 million...


Ouch. Is this a one-time thing, or do they have a history of that?

What I was trying to say was that I can understand companies making this mistake once. Maybe even twice. But once a company has an established pattern of doing this, they lose the benefit of the doubt and we have to assume they're doing it deliberately. Even more-so if they do scummy things like try to bury their "it's 1000x worse" update with other high-profile news.

Facebook has a long history of this kind of thing now, every single new privacy violation of theirs must be met with the expectation that it's actually 1000x worse than they say. I'm not aware of any other companies with a history of doing this multiple times.


New rule: whenever facebook announces a breach and scale of effect, just square the scale of effect right off the bat. You'll probably still be a little low, but you'll be a lot closer to the number in the second (or third or fourth) press release.


1e6^2 = 1e12, trillions of passwords? ...


"The security lapse was first reported last month, but at the time, Facebook said it only happened to “tens of thousands of Instagram users,” whereas the number is now being revised up to “millions.” The issue also affected “hundreds of millions of Facebook Lite users” and “tens of millions of other Facebook users.”"


I wonder how much of this is the responsibility on how Instagram was built. Like Instagram may have been built so fast that those things never were accounted for.

Not that I'm giving Facebook excuses, but maybe something was overlooked before acquisition. Then again, Facebook is breaching security protocols left and right.


1e12 square passwords.


"...various errors seem to have caused Facebook’s systems to log some passwords in plain text since as early as 2012."

1) As a software engineer I can't imagine how such errors could possibly have entered production code accidentally, especially after code review. If precise details of these errors are released I am open to have my mind changed.

2) Even if it did, I can't see how it would take facebook's tens of thousands of engineers 7 years to find this "bug".


As a mere programmer I can very easily image dozens of ways such errors could easily enter production code accidentally.


With my admin hat on, I don't see how I would miss leaking millions of passwords to logs, since 2012. Unless I never reviewed them.


If they logged complete requests, the password might well be buried somewhere deep in request body, and not readily apparent to whoever is skimming the logs - especially if this doesn't happen for all requests, but only some subset of them.


Wouldn’t it be pretty simple (for FB) to create accounts with super unique password as part of their test process? Then they scan all their storeage for the super unique password? Or am I missing something?


To implement, yes. But you'd have to think about this as a problem first. The very fact that we're having this discussion here now shows that it's not all that obvious, at least for your average dev. How their privacy and/or security review missed it is a more interesting question.


TIL facebook hires average devs.


Well combine this with them releasing a "feature" to ask users for their email passwords and you have to wonder.


That test user also would have to hit every nook and cranny of their code. That’s hard, and may not be repeatably possible (some code may only run at first login, or when making your 1024th post, or if you’re posting a movie before you ever post a photo, or when a single server sees its millionth post, etc)

They also would have to run it in such a way after every code update. Since every code update may introduce a new nook or cranny, that would slow down development too much.

I wouldn’t even try doing that, but instead have a two-pronged defense:

- code review of every single log statement in the code base by individuals whose _only_ job it is to prevent such problems.

- permanent checking of every single line logged for a thousand or so common passwords.

At Facebook’s scale, the second probably has lots of false positives, so I would do that at time of logging, when the location doing that logging is known, so that an alarm only gets triggered if a single log statement repeatedly logs passwords.


Do you know of any companies doing that?


Yes, I do. They’re called canaries by the security teams. It’s especially useful for when you run automated tests when pushing updates in your pipeline. Scan the logs and database entries for the canaries.


Thank you, didn't know what they're called.


If someone logs a full request, the first thing that should come to their minds is the privacy.

Someone who works as a Dev on Facebook should know that HTTP requests can contain sensitive data, session cookies, passwords, credit card information, etc.


Probably because you don't work on systems anywhere near the scale of Facebook. Facebook has 2.3 billion MAUs, so if they had some edge case that hit 0.1% of users, that's already 2.3 million users.


And what does scale have to do with logging any plaintext passwords?


Because it was a bug. They most likely weren't doing this:

log(request.password);

But they had some code that did this:

log(excludeSensitive(request. headers));

But then someone on a different team changed how passwords were sent so excludeSensitive was broken.


Scrubbing passwords and other sensitive information from logs is such a basic requirement. It’s something you write a regression test for and never think about again.


As a manager, I can tell you we would never do such a thing, and if we did, then we never will again, and if we do, never again after that.


Not too far fetch... For example, I've seen people put in place logging systems that logged the whole request body and then someone changed a field name and all the sudden as a side effect the code that sanitizes the body to remove the sensitive information no longer does it's job.


I liked the suggestion of logging known password to detect leaks


It’s a good start at least. You still have to worry about the password being in the middle of a base64 encoded string and weird cases like that but at least you could grep for some known things.


1. You decide that logging passwords is bad.

2. You create a system that detects high entropy content before being logged.

3. You don't want to drop all high entropy content, so you create some rules about where in requests to look for high entropy content.

4. Something about the request structure changes, breaking your log filtering.

5. There is nothing that notices the drop in the amount of content filtered out of logs.

There's oodles of ways this could happen. I'd wager that more than half of all businesses that have a website that handles passwords has logged passwords in plaintext somewhere.


> You create a system that detects high entropy content before being logged.

Unfortunately, as numerous password breaches have shown, most passwords aren't that high entropy.


Perhaps also checking logs for x of the most common passwords could work? On the scale of Facebook this might very likely trigger some positives with such a bug?


"1) As a software engineer I can't imagine how such errors could possibly have entered production code accidentally"

Like i'd love to agree, but frankly, i'm not surprised. I feel like I've seen crazier things happen on production environments...


During a job interview at one mid-size startup one of the interview questions involved handing me a 2 page excerpt of server logs and asking me to identify bugs/issues from the messages and tracebacks in the logs. (neat idea!)

The logs contained user credentials, and they hadn't noticed. I pointed it out to the CTO.


Hey, in the wild! I've long advocated for companies to use actual bugs (that have been fixed) in testing candidates. Heck, as your anecdote suggests, they might find the candidate provides a fix better than their ninja rockstar did, and in this way could serve as part of a performance review for existing employees. ;)


Wow, their interview question (in the hands of the wrong interviewee) could have caused a data breach.


If it's a plaintext password, the minute it hits any human eyes, it is IMO compromised and the user should be required to change it. Employee, Interviewee, Hacker - doesn't matter.


I'd go a step further. IMO the second it hits a hard drive it's compromised so just the act of having that in the log would count.


This happens all the time, likely as the side-effect of some other activity - instrumentation related to exception handling / error monitoring, performance analysis, debugging, etc.


My assumption since the start of this is that it's telemetry / server log data. Honestly, I came really close to doing this exact thing.

My engineer: "Should we log the body of api requests?"

Me: "Yes, of course!....... waaaait a second. No."

The result was manually white listing fields that SHOULD be logged on each end point. It's a pain compared to "just log everything" or even a black list, but it's far safer.


>1) As a software engineer I can't imagine how such errors could possibly have entered production code accidentally, especially after code review. If precise details of these errors are released I am open to have my mind changed.

You can't imagine? Really? Error occurs, request that caused error gets logged.


I think Rails had this issue by default for some time.


Obviously you shouldn't log passwords, but what kind of security mechanism have been effective to catch this error?

One thing I can think of is have some production test accounts that are regularly used and have a unique password. Then have an automated task that periodically greps the production logs for the password to see if you have a log leak.

Any other approaches?


I was also thinking about using canaries like this to detect database leaks - basically insert fake users into your DB with random emails and if you see that somebody is trying to email the accounts or log-in to their accounts you know you have been compromised. Not sure if anybody has ever used something like this...


It’s a legit strategy. I’ve heard it called a “honeypot”


You could grep production logs and other data stores for commonly used passwords like "password123".


More dependable would be tests that inject unique privileged information and then check for that appearing anywhere it shouldn't.


You need to do both.

Real users will find routes through your app that automated tests never consider. For example, what if the users 1-year login cookie expires as they're on step 3 of a 4 step "change your avatar" process. That's highly unlikely, and probably not even tested. Yet it might well work (due to good modular design), but also log inappropriate data.


Better hope data isn't encoded in some other format.

Better hope you have na exhaustive list of all the production logs.


You would miss partial or manipulated passwords (e.g. base64 encoded) with that approach, but it's a first step.


I actually really like that idea, constantly scanning logs for hyper-specific red flags. I might start doing this.


I do this already, for both user passwords, but also accidental leaking of internal company data into logfiles.

It also checks for base64 encoded versions of the data (with various alignments). There is also an alert if data is unscannable (due to compression or encryption).

The check is done at logs ingestion points, but also on outgoing http requests from webdriver automated tests (since some third party scripts might be shipping the data off to someone else's server).

The scanned for words are:

* the top 100 passwords, excluding things used as test strings. * A few company specific passwords. * A few testing passwords * A few random strings which are also deliberately inserted into source code files in places that should never (by design) pass between client and server.


Thanks so much for sharing.


OK, this is getting ridiculous. There is no best practices. You think a company worth Billions would have a standard of ethics.

Afterall, that's what they interview on. Best Practices. Ethics.

Doesn't Facebook have blogs on things like efficient servers, algorithms and such. They are obviously at the forefront of technology (or know about it). Why aren't they implementing these things?

It seems to me that the more your company is worth, the less you get to bend rules, all to chase more money. HR an lawyers will take care of the rest.


Maybe instead of quizzing obscure sort algorithms at interviews, they should have simple yes/no questions such as:

Should you store plaintext password on the server side?


Earlier post of the exact same story.

https://news.ycombinator.com/item?id=19693233


And they announced it while they thought the attention was elsewhere (due to the Muller report release), and in an easy to miss manner (as an update to a blog post). Very trust inspiring.


Edit: The following is correct but misleading. The Muller report was reported to be released before the Facebook security update.

Facebook announced this before the Muller report was released.

The Facebook security post was updated April 18, 2019 at 7AM PT [1].

The Muller report was released "Thursday morning, shortly after 11 am [EDT]" [2]

[1] https://newsroom.fb.com/news/2019/03/keeping-passwords-secur...

[2] https://www.vox.com/2019/4/18/18411966/mueller-report-releas...

Disclosure: I work for a big tech company but not Facebook. All opinions are my own.



I wasn't aware of this. I updated my comment to reflect this.


In a way this is even more worrying (knowing that the Mueller report release time was widely known beforehand).

It's probably "safer" for Facebook to make the announcement with the guaranteed disruption of the Mueller report release afterwards, than it is for them to release it after the report is out, since they can't predict how long people will be focused on the report. I can imagine a conversation along the lines of what I just wrote happening at Facebook, and that's really a distirubing level of coordination to hide a serious incident.


Barr's press conference was at 9.30 AM EDT, however.


I wasn't aware of this. I updated my comment to reflect this.


How does Facebook, hiring some of the best engineers in the industry with the biggest bucks ends up in such a mess


But think of the whiteboard algorithms they can all put together in a pinch


maybe they need passwords to login as user?


even if they want to emulate a user, they can do so without their password. Or if doing so in a simplistic way, they can login with said user's hashed password, instead of the original plaintext.


I feel like this could be avoided by hashing the password on the client side as well before sending it to the server, no?


Yes, but while "storing passwords in plaintext is bad, mkay?" is accepted in the industry, "transmitting passwords to the server is bad, mkay?" is, in fact, not accepted. It is usually considered "best practice" to transmit passwords to the server, which is obviously wrong.


That doesn't work. If you hash on the client then the hash IS the "password" and is thus sensitive info that you'd need to hash again on the server and shouldn't be stored anywhere in plain text.


While I agree with what you say, one could argue there is the additional element of protecting against user password reuse between sites.

If the client-side hash is strong enough it means that your plaintext password leak becoming a hashed password leaks at least protects the user from reuse on other sites.


I’d argue it is “best practice” to present credentials to an authentication server, and provide a token to the various user services to reduce the impact of things like over-logging.


That defeats the purpose of hashing. In your model, the hashed password is all an attacker needs to log in, so the hashed password essentially IS the password. Thus, when you store hashed passwords, you are doing the equivalent of storing plaintext passwords.


No, because if you salt it then that password becomes unique to your website. Since users tend to reuse passwords across websites, the password becomes far less valuable if it is already salted and hashed once first before sending to the server for a second round of salting and hashing.


"Far less valuable" depends on whether access to your site is particularly valuable. If you hash passwords once on the client and twice on the server, your site itself is no more secure than if you only hashed passwords once on the server (and not on the client). If you hash passwords once on the client and once on the server, your site itself is drastically less secure. The secondary protection of shared passwords on other sites, from a narrow class of data compromise involving request logs, is not the purpose of hashing, so you have not actually contradicted my statement.

If you are concerned about transmitting passwords, use SRP (https://en.wikipedia.org/wiki/Secure_Remote_Password_protoco...) instead of inventing your own bad crypto.


>If you hash passwords once on the client and twice on the server, your site itself is no more secure than if you only hashed passwords once on the server (and not on the client). If you hash passwords once on the client and once on the server, your site itself is drastically less secure.

What do you mean by "once" and "twice"? Are you comparing splitting the hashing cost across client and server vs. adding extra hashing cost on client?


If you are going to do this, why don't you just do actual public key crypto and never send an actual password? Just do mtls.


What happens when the user loses their private key?


If you do layer 7 NSM/DPI then you're most likely logging passwords. The good news is this data typically has significantly stronger controls over access logs as it's collected for secops.


I am wondering why sending onetime login link to an email is still not a thing?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: