Passwords recognized as single tokens inconsistently rewarded for capitalization #232

lambdacasserole · 2018-06-21T16:55:22Z

First of all, great library, great paper and really enjoyable talk at USENIX. We've been using this library at Teesside University's Software Reliability Lab for some research around the security of password composition policies and it yields some really interesting data (and drops right in to our tooling).

While we were working with zxcvbn, we uncovered a few interesting characteristics that we hope to present to the maintainers here in future, but for now there's one particular issue that I'd like to propose a fix for.

Overview

The crux of the issue is as follows:

The uppercase_variations function takes a match and returns a guess number multiplier based on how that match's word is capitalized. This works great for passwords that have a worst-case (i.e. lowest guess number) partitioning that does not contain partitions with non-letter characters at the beginning or end, but otherwise rewards capitalization too generously.

Demonstration

For an example of what I mean, take bananas123 as a case in which there's no issue. This string is cleanly partitioned into bananas (dictionary) and 123 (sequence). Guess numbers are calculated for each partition which are then used to compute the overall guess number. Capitalizing the b or the s in this password multiplies the guess number calculated for bananas by 2. Everything works as expected.

Now consider 12345qwert, a case where things don't quite work as expected. The worst-case partitioning of this password contains only one token 12345qwert (dictionary) because this can be found in the library's internal common passwords list. Because uppercase_variations does not strip non-letter characters before computing the guess number multiplier for a partition, capitalizing the q in this password rewards the password for having a capital letter "in the middle" when it is intuitively in a terminal position in the qwert substring and warrants a flat multiplier of 2 only. This leads to some strange transpositions like 12345Qwert (guess number 2521) being valued as stronger than 12345qwerT (guess number 1009).

Evidence

The following table shows a sample of tokens from the common passwords list that exhibit the issue described, against their number of occurrences in Troy Hunt's Pwned Passwords as: all-lowercase; with their first letter capitalized; and with their last letter capitalized:

Password	Lowercase	Capitalized (First)	Capitalized (Last)
1q2w3e4r5t	1109333	5347	1021
1qaz2wsx	726341	3011	2271
123qwe	675027	2906	516
1q2w3e4r	598708	4685	3135
123abc	595847	1215	91
1q2w3e	473459	2171	363
1234qwer	385656	3860	815
12qwaszx	312045	848	459
50cent	179247	478	11

Now consider the guess number given by zxcvbn to these same tokens:

Password	Lowercase	Capitalized (Start)	Capitalized (End)
1q2w3e4r5t	291	1451	581
1qaz2wsx	29	169	57
123qwe	34	100	67
1q2w3e4r	193	769	385
123abc	224	670	447
1q2w3e	687	2059	1373
1234qwer	87	345	173
12qwaszx	350	2095	699
50cent	2248	8989	4495

In all cases shown, the more common capitalization scheme is given a higher guess number. This is the opposite of what we want. After stripping all non-letter characters before doing any calculation in uppercase_variations we get the following:

Password	Lowercase	Capitalized (Start)	Capitalized (End)
1q2w3e4r5t	291	581	581
1qaz2wsx	29	57	57
123qwe	34	67	67
1q2w3e4r	193	385	385
123abc	224	447	447
1q2w3e	687	1373	1373
1234qwer	87	173	173
12qwaszx	350	699	699
50cent	2248	4495	4495

Guess numbers are now appropriately low. This change will only ever result in a reduction in guess number, never an increase.

Impact

Running a quick script over the common passwords list to pull out entries that both contain more than one letter and start or end with a non-letter sequence yields 10,310 results. Any password containing these as dictionary partitions will have its guess number affected if a capital letter occurs as the first or last letter in the string.

Remediation

This issue is readily fixed by stripping non-letter characters before we do any computation in the uppercase_variations function. In the case of 12345qwert we now get a guess number of 1009 whether we capitalize the q or the t. I'll submit a PR straight after putting this issue in.

The text was updated successfully, but these errors were encountered:

lambdacasserole mentioned this issue Jun 21, 2018

Strip non-letters before calculating multiplier #233

Open

MrWook mentioned this issue Jan 5, 2021

Still maintained? #290

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passwords recognized as single tokens inconsistently rewarded for capitalization #232

Passwords recognized as single tokens inconsistently rewarded for capitalization #232

lambdacasserole commented Jun 21, 2018

Passwords recognized as single tokens inconsistently rewarded for capitalization #232

Passwords recognized as single tokens inconsistently rewarded for capitalization #232

Comments

lambdacasserole commented Jun 21, 2018

Overview

Demonstration

Evidence

Impact

Remediation