Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Too high estimates when finding words in dictionaries #66

Open
per-erik opened this issue Nov 17, 2021 · 1 comment
Open

Too high estimates when finding words in dictionaries #66

per-erik opened this issue Nov 17, 2021 · 1 comment
Assignees

Comments

@per-erik
Copy link

The way that DictionaryMatcher#match works isn't consistent, or at least one could make an argument that it isn't. It gives too high entropy to some passwords that are just "worse" l33t-substituted versions of some other password. For example

The password "passw0rd" will be matched directly to a word found in the dictionary "passwords". In that dictionary "passw0rd" is at rank 411 so it is given an entropy of ~8.68.

The password "p4s5w0rd" will not be matched directly to a word found in the dictionary "passwords". Instead the match-method will try to use l33t-substitutions and one of those substitutions is the word "password". That word is then looked up in the dictionaries and found at rank 2 in the "passwords" dictionary. Thus, the password is given an entropy of ~3.32.

The "easy" fix for this might be to not short circuit the match-method with "continue" in the for-loops, but I'm not sure how that would affect other parts of the system and it will increase the running time of estimating a password.

Note that I'm not saying that "p4s5w0rd" is more secure than "passw0rd" and should get a higher score. What I'm saying is that its estimated entropy shouldn't be lower, since both strings are essentially a l33t-substitution on the word "password" - it's just that one of the strings is a common password while the other one isn't.

In general, if two strings can both be transformed into the same string and that transformed string has a lower entropy than either of the non-transformed strings, then both the non-transformed strings should get the entropy value of the transformed string.

@Tostino
Copy link
Collaborator

Tostino commented Jan 5, 2023

Yup, this looks like it's doing something a bit dumb there. Thanks for pointing this logic failure out. Will see what I can do here.

@Tostino Tostino self-assigned this Jul 18, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants