Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Passwords recognized as single tokens inconsistently rewarded for capitalization #232

Open
lambdacasserole opened this issue Jun 21, 2018 · 0 comments

Comments

@lambdacasserole
Copy link

First of all, great library, great paper and really enjoyable talk at USENIX. We've been using this library at Teesside University's Software Reliability Lab for some research around the security of password composition policies and it yields some really interesting data (and drops right in to our tooling).

While we were working with zxcvbn, we uncovered a few interesting characteristics that we hope to present to the maintainers here in future, but for now there's one particular issue that I'd like to propose a fix for.

Overview

The crux of the issue is as follows:

The uppercase_variations function takes a match and returns a guess number multiplier based on how that match's word is capitalized. This works great for passwords that have a worst-case (i.e. lowest guess number) partitioning that does not contain partitions with non-letter characters at the beginning or end, but otherwise rewards capitalization too generously.

Demonstration

For an example of what I mean, take bananas123 as a case in which there's no issue. This string is cleanly partitioned into bananas (dictionary) and 123 (sequence). Guess numbers are calculated for each partition which are then used to compute the overall guess number. Capitalizing the b or the s in this password multiplies the guess number calculated for bananas by 2. Everything works as expected.

Now consider 12345qwert, a case where things don't quite work as expected. The worst-case partitioning of this password contains only one token 12345qwert (dictionary) because this can be found in the library's internal common passwords list. Because uppercase_variations does not strip non-letter characters before computing the guess number multiplier for a partition, capitalizing the q in this password rewards the password for having a capital letter "in the middle" when it is intuitively in a terminal position in the qwert substring and warrants a flat multiplier of 2 only. This leads to some strange transpositions like 12345Qwert (guess number 2521) being valued as stronger than 12345qwerT (guess number 1009).

Evidence

The following table shows a sample of tokens from the common passwords list that exhibit the issue described, against their number of occurrences in Troy Hunt's Pwned Passwords as: all-lowercase; with their first letter capitalized; and with their last letter capitalized:

Password Lowercase Capitalized (First) Capitalized (Last)
1q2w3e4r5t 1109333 5347 1021
1qaz2wsx 726341 3011 2271
123qwe 675027 2906 516
1q2w3e4r 598708 4685 3135
123abc 595847 1215 91
1q2w3e 473459 2171 363
1234qwer 385656 3860 815
12qwaszx 312045 848 459
50cent 179247 478 11

Now consider the guess number given by zxcvbn to these same tokens:

Password Lowercase Capitalized (Start) Capitalized (End)
1q2w3e4r5t 291 1451 581
1qaz2wsx 29 169 57
123qwe 34 100 67
1q2w3e4r 193 769 385
123abc 224 670 447
1q2w3e 687 2059 1373
1234qwer 87 345 173
12qwaszx 350 2095 699
50cent 2248 8989 4495

In all cases shown, the more common capitalization scheme is given a higher guess number. This is the opposite of what we want. After stripping all non-letter characters before doing any calculation in uppercase_variations we get the following:

Password Lowercase Capitalized (Start) Capitalized (End)
1q2w3e4r5t 291 581 581
1qaz2wsx 29 57 57
123qwe 34 67 67
1q2w3e4r 193 385 385
123abc 224 447 447
1q2w3e 687 1373 1373
1234qwer 87 173 173
12qwaszx 350 699 699
50cent 2248 4495 4495

Guess numbers are now appropriately low. This change will only ever result in a reduction in guess number, never an increase.

Impact

Running a quick script over the common passwords list to pull out entries that both contain more than one letter and start or end with a non-letter sequence yields 10,310 results. Any password containing these as dictionary partitions will have its guess number affected if a capital letter occurs as the first or last letter in the string.

Remediation

This issue is readily fixed by stripping non-letter characters before we do any computation in the uppercase_variations function. In the case of 12345qwert we now get a guess number of 1009 whether we capitalize the q or the t. I'll submit a PR straight after putting this issue in.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant