Your "Strong" Password May Be Weaker Than You Think

If you’ve been relying on password meters to determine how strong your passwords are, we’ve got some bad news. Their strength measurements are highly inconsistent and may even be leading you astray, according to a new study from researchers at Concordia University:

In our large-scale empirical analysis, it is evident that the commonly-used meters are highly inconsistent, fail to provide coherent feedback, and sometimes provide strength measurements that are blatantly misleading.

Researchers Xavier de Carné de Carnavalet and Mohammad Mannan evaluated the password strength meters used by a selection of popular websites and password managers. The sites surveyed included Apple, Dropbox, Drupal, Google, eBay, Microsoft, PayPal, Skype, Tencent QQ, Twitter, Yahoo and the Russian-based email provider Yandex Mail; the researchers also looked at popular password managers including LastPass, 1Password, and KeePass. They added FedEx and the China Railway customer-service center site for diversity.

De Carné de Carnavalet and Mannan then assembled a list of close to 9.5 million passwords from publicly available dictionaries, including lists from real-life password leaks, and ran them through those services to what kind of job their password-strength meters were doing.

Ineffective Rules

Password strength meters typically looked for length, a variety of character sets (such as upper and lower case letters, numbers, and symbols). Some tried to detect common words or weak patterns.

However, the strength meters that looked at password composition often ignored other easy-to-crack patterns, and didn’t take “Leet” transformations—which replace the letter l with the number 1, for example—into account. Hackers, of course, often try these variations when trying to crack passwords.

Inconsistent Results

Confusingly enough, nearly identical passwords provided very different outcomes. For example, Paypal01 was considered poor by Skype’s standards, but strong by PayPal’s. Password1 was considered very weak by Dropbox but very strong by Yahoo!, and received three different scores by three Microsoft checkers (strong, weak, and medium). The password #football1 was also considered to be very weak by Dropbox, but Twitter rated it perfect.

In some cases, minor variations changed the assessment as well due to an overemphasis on minimum requirements: password$1 was correctly assigned very weak by FedEx, but it considered Password$1 very strong. Yahoo considered qwerty to be a weak password, but qwerty1 was strong.

Similar problems emerged with Google, which found password0 weak, but password0+ strong. False negatives turned up as well—FedEx considered +ˆv16#5{]( a very weak password, apparently because it contains no capital letters.

“Some meters are so weak and incoherent (e.g., Yahoo! and Yandex) that one may wonder what purpose they may serve,” the researchers wrote.

Black Boxes, Black Boxes

De Carné de Carnavalet and Mannan argue that the opacity of password checkers works to their detriment. That could also be a problem for users confused by oddly inconsistent password-strength results.

“Except Dropbox, and KeePass (to some extent), no other meters in our test set provide any publicly-available explanation of their design choices, or the logic behind their strength assignment techniques,” the researchers wrote.

With the exception of Dropbox and KeePass, the password meters appeared to be designed in an ad hoc manner, and often rated weak passwords as strong. As the researchers wrote: “Dropbox’s rather simple checker is quite effective in analyzing passwords, and is possibly a step towards the right direction (KeePass also adopts a similar algorithm).”

De Carné de Carnavalet and Mannan recommend that popular web services adopt a commonly shared algorithm for their password strength meters. In particular, they suggest using or extending the zxcvbn algorithm used by Dropbox or the KeePass open-source implementation of it.

Lead image by nikcname