If they logged complete requests, the password might well be buried somewhere deep in request body, and not readily apparent to whoever is skimming the logs - especially if this doesn't happen for all requests, but only some subset of them.
Wouldn’t it be pretty simple (for FB) to create accounts with super unique password as part of their test process? Then they scan all their storeage for the super unique password? Or am I missing something?
To implement, yes. But you'd have to think about this as a problem first. The very fact that we're having this discussion here now shows that it's not all that obvious, at least for your average dev. How their privacy and/or security review missed it is a more interesting question.
That test user also would have to hit every nook and cranny of their code. That’s hard, and may not be repeatably possible (some code may only run at first login, or when making your 1024th post, or if you’re posting a movie before you ever post a photo, or when a single server sees its millionth post, etc)
They also would have to run it in such a way after every code update. Since every code update may introduce a new nook or cranny, that would slow down development too much.
I wouldn’t even try doing that, but instead have a two-pronged defense:
- code review of every single log statement in the code base by individuals whose _only_ job it is to prevent such problems.
- permanent checking of every single line logged for a thousand or so common passwords.
At Facebook’s scale, the second probably has lots of false positives, so I would do that at time of logging, when the location doing that logging is known, so that an alarm only gets triggered if a single log statement repeatedly logs passwords.
Yes, I do. They’re called canaries by the security teams. It’s especially useful for when you run automated tests when pushing updates in your pipeline. Scan the logs and database entries for the canaries.
If someone logs a full request, the first thing that should come to their minds is the privacy.
Someone who works as a Dev on Facebook should know that HTTP requests can contain sensitive data, session cookies, passwords, credit card information, etc.
Probably because you don't work on systems anywhere near the scale of Facebook. Facebook has 2.3 billion MAUs, so if they had some edge case that hit 0.1% of users, that's already 2.3 million users.
Scrubbing passwords and other sensitive information from logs is such a basic requirement. It’s something you write a regression test for and never think about again.