Too high estimates when finding words in dictionaries #66

per-erik · 2021-11-17T17:14:54Z

The way that DictionaryMatcher#match works isn't consistent, or at least one could make an argument that it isn't. It gives too high entropy to some passwords that are just "worse" l33t-substituted versions of some other password. For example

The password "passw0rd" will be matched directly to a word found in the dictionary "passwords". In that dictionary "passw0rd" is at rank 411 so it is given an entropy of ~8.68.

The password "p4s5w0rd" will not be matched directly to a word found in the dictionary "passwords". Instead the match-method will try to use l33t-substitutions and one of those substitutions is the word "password". That word is then looked up in the dictionaries and found at rank 2 in the "passwords" dictionary. Thus, the password is given an entropy of ~3.32.

The "easy" fix for this might be to not short circuit the match-method with "continue" in the for-loops, but I'm not sure how that would affect other parts of the system and it will increase the running time of estimating a password.

Note that I'm not saying that "p4s5w0rd" is more secure than "passw0rd" and should get a higher score. What I'm saying is that its estimated entropy shouldn't be lower, since both strings are essentially a l33t-substitution on the word "password" - it's just that one of the strings is a common password while the other one isn't.

In general, if two strings can both be transformed into the same string and that transformed string has a lower entropy than either of the non-transformed strings, then both the non-transformed strings should get the entropy value of the transformed string.

Tostino · 2023-01-05T17:29:51Z

Yup, this looks like it's doing something a bit dumb there. Thanks for pointing this logic failure out. Will see what I can do here.

Tostino self-assigned this Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too high estimates when finding words in dictionaries #66

Too high estimates when finding words in dictionaries #66

per-erik commented Nov 17, 2021

Tostino commented Jan 5, 2023

Too high estimates when finding words in dictionaries #66

Too high estimates when finding words in dictionaries #66

Comments

per-erik commented Nov 17, 2021

Tostino commented Jan 5, 2023