Literally every single time Facebook announces something like this, they always announce low numbers and then repeatedly revise them upwards.
Is there no penalty to them for basically lying to everyone in their initial announcement? Doing it once can be explained as they sincerely thought their original number was correct, but they have a history of doing this and there's no chance I believe now that they aren't doing this on purpose.
I think revising upwards is them looking for other bugs of the same type, and finding lots.
I can totally imagine someone on the instagram photo pinch-to-zoom team saying "Whoa - if someone re-logs in while a photo is partially zoomed, our metrics capture the password. Thats bad!". They find thats super-rare, and only impacts 10k users.
Later, other logfiles are grepped for passwords (possibly even by doing a fulltext search across all fields and blobs for common passwords), and more and more are found.
Then someone says "but some of our data is in fields base64 encoded - did you try grepping for base64 encoded plaintext", and even more are found, etc.
Benefit of the doubt is something you give to a company the very first time it does something like this. Not something you give the umpteenth time they do something like this, and especially not when they try and hide their "sorry it was waaaay worse" update amid other high-profile news.
> Then someone says "but some of our data is in fields base64 encoded - did you try grepping for base64 encoded plaintext", and even more are found, etc.
This is painfully accurate. I've encountered exactly this kind of thing, including having to repeat search efforts.
This seems like it would happen with most breaches.
You learn about an issue, do some investigation, get an estimate, then inform the public. If Facebook had waited for months to complete the investigation and get these numbers, do you think people would have accepted that? Probably not. So they publish what they know when they think they've got it figured out. But probably this launches up a team that goes looking for other bad logging. Then a while later they find some more errors and the numbers are revised up.
I can't think of any other company that has a history of downplaying breeches and then revising the numbers upwards. If you don't know how many people are impacted, you aren't supposed to just make up a number. At a bare minimum you'd say something like "We know 10k users are definitely impacted, and we estimate this could be as high as 5M" or whatever the appropriate ballpark is.
Ouch. Is this a one-time thing, or do they have a history of that?
What I was trying to say was that I can understand companies making this mistake once. Maybe even twice. But once a company has an established pattern of doing this, they lose the benefit of the doubt and we have to assume they're doing it deliberately. Even more-so if they do scummy things like try to bury their "it's 1000x worse" update with other high-profile news.
Facebook has a long history of this kind of thing now, every single new privacy violation of theirs must be met with the expectation that it's actually 1000x worse than they say. I'm not aware of any other companies with a history of doing this multiple times.
New rule: whenever facebook announces a breach and scale of effect, just square the scale of effect right off the bat. You'll probably still be a little low, but you'll be a lot closer to the number in the second (or third or fourth) press release.
"The security lapse was first reported last month, but at the time, Facebook said it only happened to “tens of thousands of Instagram users,” whereas the number is now being revised up to “millions.” The issue also affected “hundreds of millions of Facebook Lite users” and “tens of millions of other Facebook users.”"
I wonder how much of this is the responsibility on how Instagram was built. Like Instagram may have been built so fast that those things never were accounted for.
Not that I'm giving Facebook excuses, but maybe something was overlooked before acquisition. Then again, Facebook is breaching security protocols left and right.
"...various errors seem to have caused Facebook’s systems to log some passwords in plain text since as early as 2012."
1) As a software engineer I can't imagine how such errors could possibly have entered production code accidentally, especially after code review. If precise details of these errors are released I am open to have my mind changed.
2) Even if it did, I can't see how it would take facebook's tens of thousands of engineers 7 years to find this "bug".
If they logged complete requests, the password might well be buried somewhere deep in request body, and not readily apparent to whoever is skimming the logs - especially if this doesn't happen for all requests, but only some subset of them.
Wouldn’t it be pretty simple (for FB) to create accounts with super unique password as part of their test process? Then they scan all their storeage for the super unique password? Or am I missing something?
To implement, yes. But you'd have to think about this as a problem first. The very fact that we're having this discussion here now shows that it's not all that obvious, at least for your average dev. How their privacy and/or security review missed it is a more interesting question.
That test user also would have to hit every nook and cranny of their code. That’s hard, and may not be repeatably possible (some code may only run at first login, or when making your 1024th post, or if you’re posting a movie before you ever post a photo, or when a single server sees its millionth post, etc)
They also would have to run it in such a way after every code update. Since every code update may introduce a new nook or cranny, that would slow down development too much.
I wouldn’t even try doing that, but instead have a two-pronged defense:
- code review of every single log statement in the code base by individuals whose _only_ job it is to prevent such problems.
- permanent checking of every single line logged for a thousand or so common passwords.
At Facebook’s scale, the second probably has lots of false positives, so I would do that at time of logging, when the location doing that logging is known, so that an alarm only gets triggered if a single log statement repeatedly logs passwords.
Yes, I do. They’re called canaries by the security teams. It’s especially useful for when you run automated tests when pushing updates in your pipeline. Scan the logs and database entries for the canaries.
If someone logs a full request, the first thing that should come to their minds is the privacy.
Someone who works as a Dev on Facebook should know that HTTP requests can contain sensitive data, session cookies, passwords, credit card information, etc.
Probably because you don't work on systems anywhere near the scale of Facebook. Facebook has 2.3 billion MAUs, so if they had some edge case that hit 0.1% of users, that's already 2.3 million users.
Scrubbing passwords and other sensitive information from logs is such a basic requirement. It’s something you write a regression test for and never think about again.
Not too far fetch... For example, I've seen people put in place logging systems that logged the whole request body and then someone changed a field name and all the sudden as a side effect the code that sanitizes the body to remove the sensitive information no longer does it's job.
It’s a good start at least. You still have to worry about the password being in the middle of a base64 encoded string and weird cases like that but at least you could grep for some known things.
2. You create a system that detects high entropy content before being logged.
3. You don't want to drop all high entropy content, so you create some rules about where in requests to look for high entropy content.
4. Something about the request structure changes, breaking your log filtering.
5. There is nothing that notices the drop in the amount of content filtered out of logs.
There's oodles of ways this could happen. I'd wager that more than half of all businesses that have a website that handles passwords has logged passwords in plaintext somewhere.
Perhaps also checking logs for x of the most common passwords could work? On the scale of Facebook this might very likely trigger some positives with such a bug?
During a job interview at one mid-size startup one of the interview questions involved handing me a 2 page excerpt of server logs and asking me to identify bugs/issues from the messages and tracebacks in the logs. (neat idea!)
The logs contained user credentials, and they hadn't noticed. I pointed it out to the CTO.
Hey, in the wild! I've long advocated for companies to use actual bugs (that have been fixed) in testing candidates. Heck, as your anecdote suggests, they might find the candidate provides a fix better than their ninja rockstar did, and in this way could serve as part of a performance review for existing employees. ;)
If it's a plaintext password, the minute it hits any human eyes, it is IMO compromised and the user should be required to change it. Employee, Interviewee, Hacker - doesn't matter.
This happens all the time, likely as the side-effect of some other activity - instrumentation related to exception handling / error monitoring, performance analysis, debugging, etc.
My assumption since the start of this is that it's telemetry / server log data. Honestly, I came really close to doing this exact thing.
My engineer: "Should we log the body of api requests?"
Me: "Yes, of course!....... waaaait a second. No."
The result was manually white listing fields that SHOULD be logged on each end point. It's a pain compared to "just log everything" or even a black list, but it's far safer.
>1) As a software engineer I can't imagine how such errors could possibly have entered production code accidentally, especially after code review. If precise details of these errors are released I am open to have my mind changed.
You can't imagine? Really? Error occurs, request that caused error gets logged.
Obviously you shouldn't log passwords, but what kind of security mechanism have been effective to catch this error?
One thing I can think of is have some production test accounts that are regularly used and have a unique password. Then have an automated task that periodically greps the production logs for the password to see if you have a log leak.
I was also thinking about using canaries like this to detect database leaks - basically insert fake users into your DB with random emails and if you see that somebody is trying to email the accounts or log-in to their accounts you know you have been compromised. Not sure if anybody has ever used something like this...
Real users will find routes through your app that automated tests never consider. For example, what if the users 1-year login cookie expires as they're on step 3 of a 4 step "change your avatar" process. That's highly unlikely, and probably not even tested. Yet it might well work (due to good modular design), but also log inappropriate data.
I do this already, for both user passwords, but also accidental leaking of internal company data into logfiles.
It also checks for base64 encoded versions of the data (with various alignments). There is also an alert if data is unscannable (due to compression or encryption).
The check is done at logs ingestion points, but also on outgoing http requests from webdriver automated tests (since some third party scripts might be shipping the data off to someone else's server).
The scanned for words are:
* the top 100 passwords, excluding things used as test strings.
* A few company specific passwords.
* A few testing passwords
* A few random strings which are also deliberately inserted into source code files in places that should never (by design) pass between client and server.
OK, this is getting ridiculous. There is no best practices. You think a company worth Billions would have a standard of ethics.
Afterall, that's what they interview on. Best Practices. Ethics.
Doesn't Facebook have blogs on things like efficient servers, algorithms and such. They are obviously at the forefront of technology (or know about it). Why aren't they implementing these things?
It seems to me that the more your company is worth, the less you get to bend rules, all to chase more money. HR an lawyers will take care of the rest.
And they announced it while they thought the attention was elsewhere (due to the Muller report release), and in an easy to miss manner (as an update to a blog post). Very trust inspiring.
In a way this is even more worrying (knowing that the Mueller report release time was widely known beforehand).
It's probably "safer" for Facebook to make the announcement with the guaranteed disruption of the Mueller report release afterwards, than it is for them to release it after the report is out, since they can't predict how long people will be focused on the report. I can imagine a conversation along the lines of what I just wrote happening at Facebook, and that's really a distirubing level of coordination to hide a serious incident.
even if they want to emulate a user, they can do so without their password. Or if doing so in a simplistic way, they can login with said user's hashed password, instead of the original plaintext.
Yes, but while "storing passwords in plaintext is bad, mkay?" is accepted in the industry, "transmitting passwords to the server is bad, mkay?" is, in fact, not accepted. It is usually considered "best practice" to transmit passwords to the server, which is obviously wrong.
That doesn't work. If you hash on the client then the hash IS the "password" and is thus sensitive info that you'd need to hash again on the server and shouldn't be stored anywhere in plain text.
While I agree with what you say, one could argue there is the additional element of protecting against user password reuse between sites.
If the client-side hash is strong enough it means that your plaintext password leak becoming a hashed password leaks at least protects the user from reuse on other sites.
I’d argue it is “best practice” to present credentials to an authentication server, and provide a token to the various user services to reduce the impact of things like over-logging.
That defeats the purpose of hashing. In your model, the hashed password is all an attacker needs to log in, so the hashed password essentially IS the password. Thus, when you store hashed passwords, you are doing the equivalent of storing plaintext passwords.
No, because if you salt it then that password becomes unique to your website. Since users tend to reuse passwords across websites, the password becomes far less valuable if it is already salted and hashed once first before sending to the server for a second round of salting and hashing.
"Far less valuable" depends on whether access to your site is particularly valuable. If you hash passwords once on the client and twice on the server, your site itself is no more secure than if you only hashed passwords once on the server (and not on the client). If you hash passwords once on the client and once on the server, your site itself is drastically less secure. The secondary protection of shared passwords on other sites, from a narrow class of data compromise involving request logs, is not the purpose of hashing, so you have not actually contradicted my statement.
>If you hash passwords once on the client and twice on the server, your site itself is no more secure than if you only hashed passwords once on the server (and not on the client). If you hash passwords once on the client and once on the server, your site itself is drastically less secure.
What do you mean by "once" and "twice"? Are you comparing splitting the hashing cost across client and server vs. adding extra hashing cost on client?
If you do layer 7 NSM/DPI then you're most likely logging passwords. The good news is this data typically has significantly stronger controls over access logs as it's collected for secops.
Is there no penalty to them for basically lying to everyone in their initial announcement? Doing it once can be explained as they sincerely thought their original number was correct, but they have a history of doing this and there's no chance I believe now that they aren't doing this on purpose.