That seems backwards to me. The entropy should be relative to how the password i...

Godel_unicode · on Dec 13, 2020

This is the correct answer. Password security is not an absolute thing, it's rather the case that a password is secure or insecure relative to an attackers ability to guess it. Interestingly enough, it's also the motivation for zxcvbn, the open source library chrome uses to evaluate password strength!

https://github.com/dropbox/zxcvbn

LeifCarrotson · on Dec 13, 2020

It does make a difference because the attackers have comparable sophistication to the generators.

They don't have to brute force all 26^10 lowercase character sequences if they can instead look through 26^3 words in a dictionary. And optionally modifying those words to append "1!" to them, or capitalize the first character, or replace e with 3 or whatever doesn't take much.

jbay808 · on Dec 13, 2020

Right; so we seem to agree entirely that the entropy is relative to the attacker's generator.

LeifCarrotson · on Dec 13, 2020

Sure, but the attackers you want to consider are the ones who have generators that can match your minimum entropy to generate the password - assume their dictionary is no longer than yours, their set of punctuation doesn't have any characters yours doesn't, etc.

Beldin · on Dec 14, 2020

It depends whether you want to determine an upper bound or a lower bound on #guesses needed.

The article's approach has no assumptions and provides an upper bound. Your approach makes several assumptions and gives a lower bound IFF those assumptions hold.

The upper bound is useful for the general case: no attacker will be worse.

You're trying to account for more sophisticated attackers, which is great! It is however not clear (without further motivation) whether your attacker model is realistic. That is: will their be an attacker who knows this much about your password generation approach, but does not know more?

If no realistic attacker would know this much, your approach gives an overapproximation (real bound is higher). If, otoh, there is an attacker who knows more (eg, seed of the PRNG, or first characters, etc.), it'll be an underapproximation (real bound will be lower).

So, the difference is that the result of the first approach can directly be interpreted, while the result of your approach needs context.

bradknowles · on Dec 14, 2020

The upper bound is a best case analysis. In the best case, I can choose “1” to be my password, and no one will ever guess it.

In the worst case, we assume that the attacker has done their home work and knows the algorithm by which the password was generated, but not the content of this particular password. So, knowing the algorithm, what can the attacker exploit that would help them discover the contents of this password in the shortest possible time?

I submit that the worst case analysis is really the only one we care about.

Beldin · on Dec 14, 2020

My point is that that is not the worst case. Example of a worse case: the attacker knows that + the first 2 characters of your password (real-life example).

The worst case would probably be something like "attacker knows hash, passwd algorithm, full state of machine at passwd generation time incl. random seed and all characters of the password but one". It is clearly far worse than your case, though I find your case more relevant than this one.

dheera · on Dec 13, 2020

Not quite, if you take the a priori assumption that the attacker will test dictionary words, you don't want your password to be a dictionary word.

However, it's so incredibly unlikely a high entropy generator would generate a dictionary word that you shouldn't even have to worry about it.

uoaei · on Dec 13, 2020

Yes, entropy can only be measured in relation to an expected probability distribution. You need to have an a priori model for how the password is generated (selected) to estimate its entropy. So the dichotomy you set up is IMO a false one, in that you are guessing about how the password was generated.

Privacy846 · on Dec 13, 2020

But if you assume that the attacker knows the generation process then you can make a very strong case that the password is no weaker than the stated entropy[1], since the attacker would have to effectively brute force the password by generating all possible passwords using that generator.

[1] Although I guess there are caveats: what if your password _happened_ to be weak according to another generation method that you didn’t use but the attacker guessed?

nakkijono · on Dec 13, 2020

Lets say that you use method such as `openssl rand -base64 6` and out comes "password". The odds of that happening would be crazy low for an individual user. However, if you deploy the same generator for a billion people it could realistically happen, and you might want to filter against outputs like that. Of course if all passwords are autogenerated (users cannot choose), the attacker gains no advantage from choosing "password" instead of "tlnNHJ4x".

ravi-delia · on Dec 13, 2020

I suppose it comes down to a worst-case best-case type of thing. If you're trying to prevent Alan Turing and Bletchley Park (new band name I call it) from guessing your password given spies and psych profiles, you care about the entropy of the generation. In principle the best password actually is whatever the last one your opponent's cracking software tries. In between is the real world, but you can't go wrong with real random generation.