You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, great library, great paper and really enjoyable talk at USENIX. We've been using this library at Teesside University's Software Reliability Lab for some research around the security of password composition policies and it yields some really interesting data (and drops right in to our tooling).
While we were working with zxcvbn, we uncovered a few interesting characteristics that we hope to present to the maintainers here in future, but for now there's one particular issue that I'd like to propose a fix for.
Overview
The crux of the issue is as follows:
The uppercase_variationsfunction takes a match and returns a guess number multiplier based on how that match's word is capitalized. This works great for passwords that have a worst-case (i.e. lowest guess number) partitioning that does not contain partitions with non-letter characters at the beginning or end, but otherwise rewards capitalization too generously.
Demonstration
For an example of what I mean, take bananas123 as a case in which there's no issue. This string is cleanly partitioned into bananas (dictionary) and 123 (sequence). Guess numbers are calculated for each partition which are then used to compute the overall guess number. Capitalizing the b or the s in this password multiplies the guess number calculated for bananas by 2. Everything works as expected.
Now consider 12345qwert, a case where things don't quite work as expected. The worst-case partitioning of this password contains only one token 12345qwert (dictionary) because this can be found in the library's internal common passwords list. Because uppercase_variations does not strip non-letter characters before computing the guess number multiplier for a partition, capitalizing the q in this password rewards the password for having a capital letter "in the middle" when it is intuitively in a terminal position in the qwert substring and warrants a flat multiplier of 2 only. This leads to some strange transpositions like 12345Qwert (guess number 2521) being valued as stronger than 12345qwerT (guess number 1009).
Evidence
The following table shows a sample of tokens from the common passwords list that exhibit the issue described, against their number of occurrences in Troy Hunt's Pwned Passwords as: all-lowercase; with their first letter capitalized; and with their last letter capitalized:
Password
Lowercase
Capitalized (First)
Capitalized (Last)
1q2w3e4r5t
1109333
5347
1021
1qaz2wsx
726341
3011
2271
123qwe
675027
2906
516
1q2w3e4r
598708
4685
3135
123abc
595847
1215
91
1q2w3e
473459
2171
363
1234qwer
385656
3860
815
12qwaszx
312045
848
459
50cent
179247
478
11
Now consider the guess number given by zxcvbn to these same tokens:
Password
Lowercase
Capitalized (Start)
Capitalized (End)
1q2w3e4r5t
291
1451
581
1qaz2wsx
29
169
57
123qwe
34
100
67
1q2w3e4r
193
769
385
123abc
224
670
447
1q2w3e
687
2059
1373
1234qwer
87
345
173
12qwaszx
350
2095
699
50cent
2248
8989
4495
In all cases shown, the more common capitalization scheme is given a higher guess number. This is the opposite of what we want. After stripping all non-letter characters before doing any calculation in uppercase_variations we get the following:
Password
Lowercase
Capitalized (Start)
Capitalized (End)
1q2w3e4r5t
291
581
581
1qaz2wsx
29
57
57
123qwe
34
67
67
1q2w3e4r
193
385
385
123abc
224
447
447
1q2w3e
687
1373
1373
1234qwer
87
173
173
12qwaszx
350
699
699
50cent
2248
4495
4495
Guess numbers are now appropriately low. This change will only ever result in a reduction in guess number, never an increase.
Impact
Running a quick script over the common passwords list to pull out entries that both contain more than one letter and start or end with a non-letter sequence yields 10,310 results. Any password containing these as dictionary partitions will have its guess number affected if a capital letter occurs as the first or last letter in the string.
Remediation
This issue is readily fixed by stripping non-letter characters before we do any computation in the uppercase_variations function. In the case of 12345qwert we now get a guess number of 1009 whether we capitalize the q or the t. I'll submit a PR straight after putting this issue in.
The text was updated successfully, but these errors were encountered:
First of all, great library, great paper and really enjoyable talk at USENIX. We've been using this library at Teesside University's Software Reliability Lab for some research around the security of password composition policies and it yields some really interesting data (and drops right in to our tooling).
While we were working with zxcvbn, we uncovered a few interesting characteristics that we hope to present to the maintainers here in future, but for now there's one particular issue that I'd like to propose a fix for.
Overview
The crux of the issue is as follows:
The
uppercase_variations
function takes a match and returns a guess number multiplier based on how that match's word is capitalized. This works great for passwords that have a worst-case (i.e. lowest guess number) partitioning that does not contain partitions with non-letter characters at the beginning or end, but otherwise rewards capitalization too generously.Demonstration
For an example of what I mean, take
bananas123
as a case in which there's no issue. This string is cleanly partitioned intobananas
(dictionary) and123
(sequence). Guess numbers are calculated for each partition which are then used to compute the overall guess number. Capitalizing theb
or thes
in this password multiplies the guess number calculated forbananas
by 2. Everything works as expected.Now consider
12345qwert
, a case where things don't quite work as expected. The worst-case partitioning of this password contains only one token12345qwert
(dictionary) because this can be found in the library's internal common passwords list. Becauseuppercase_variations
does not strip non-letter characters before computing the guess number multiplier for a partition, capitalizing theq
in this password rewards the password for having a capital letter "in the middle" when it is intuitively in a terminal position in theqwert
substring and warrants a flat multiplier of 2 only. This leads to some strange transpositions like12345Qwert
(guess number 2521) being valued as stronger than12345qwerT
(guess number 1009).Evidence
The following table shows a sample of tokens from the common passwords list that exhibit the issue described, against their number of occurrences in Troy Hunt's Pwned Passwords as: all-lowercase; with their first letter capitalized; and with their last letter capitalized:
Now consider the guess number given by zxcvbn to these same tokens:
In all cases shown, the more common capitalization scheme is given a higher guess number. This is the opposite of what we want. After stripping all non-letter characters before doing any calculation in
uppercase_variations
we get the following:Guess numbers are now appropriately low. This change will only ever result in a reduction in guess number, never an increase.
Impact
Running a quick script over the common passwords list to pull out entries that both contain more than one letter and start or end with a non-letter sequence yields 10,310 results. Any password containing these as dictionary partitions will have its guess number affected if a capital letter occurs as the first or last letter in the string.
Remediation
This issue is readily fixed by stripping non-letter characters before we do any computation in the
uppercase_variations
function. In the case of12345qwert
we now get a guess number of 1009 whether we capitalize theq
or thet
. I'll submit a PR straight after putting this issue in.The text was updated successfully, but these errors were encountered: