The conventional wisdom seems to be that passphrases are much more secure than passwords, even if the password is complex. Passphrases are likely to be more secure than passwords, but not as secure as many seem to think. Some preliminary work on the topic by Joseph Bonneau and Ekaterina Shutova suggests that optimism about the security of passphrases needs to be tempered by the fact that users are more likely to choose phrases from common language rather than random phrases.

Bonneau and Shutova presented a paper recently at the Workshop on Usable Security. The paper, "Linguistic properties of multi-word passphrases," (PDF) takes a look at passphrases using data from Amazon's (now shuttered) PayPhrase system.

Amazon PayPhrase was a a checkout system that was supposed to allow speedy checkout with a unique phrase and a PIN of 4 digits. The system wasn't created to test passphrases, but it provided a useful source of information for Bonneau and Shutova because it would reject any passphrases that had already been chosen by another user.

Generating Test Phrases

According to the paper, they tested more than 100,000 possible passphrases using an automated script to query the Amazon PayPhrase site.

The test phrases were assembled from a couple of sources. The first set simulated dictionary attack "by assembling a number of lists of phrases that English-speaking users might be expected to pick." They pulled proper nouns from "top x" lists on Wikipedia, and lists of movie stars and top movies on IMDB, which gathered more than 15,000 proper nouns. They also grabbed a number of idiomatic phrases from sports phrases on Wikipedia, and common phrases from English Language Learning Online and the Urban Dictionary.

Next, Bonneau and Shutova used random samples of "naturally-occurring 2-word phrases of varying syntactic relation, extracted from the 100-million word British National Corpus parsed by the Robust Accurate Statistical Parser." All of the phrases generated here were two-word phrases, excepting a set of phrases that require a preposition (such as "pay in cash").

Finally, they also used Google's Ngram to generate lists of 10,000 bigrams and crawled Facebook to generate a list of 10,000 random names.

Natural Language Rules

Are the passphrases more secure than passwords? Almost certainly. But the actual security of passphrases is in question, especially when an attacker can conduct an offline attack.

Bonneau and Shutova have found, so far, that users tend to choose phrases that are found in everyday language rather than something like "purple foxtrot wombat ballerina." This means attackers can assemble an effective corpus of phrases for dictionary attacks.

As Bonneau wrote, "clear trends emerged--people strongly prefer phrases which are either a single modified noun ("operation room") or a single modified verb ("send immediately"). These phrases are perhaps easier to remember than phrases which include a verb and a noun and are therefore closer to a complete sentence. Within these categories, users don't stray too far from choosing two-word phrases the way they're actually produced in natural language."

Given a real world scenario like Amazon PayPhrase, Bonneau says the results are "discouraging." While more secure than standard passwords, "by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users. The returns appear to rapidly diminish as more words are required. This has potentially serious implications for applications like PGP private keys, which are often encrypted using a passphrase. Users are clearly more random in 'passphrase English' than in actual English, but unless it's dramatically more random the underlying natural language simply isn't random enough."

In a prior post, Bonneau wrote "a really strong password is one that nobody else has ever used."

Bonneau and Shutova are recommending further study and collaboration between security experts and linguistic experts. In the meantime, if you're choosing a passphrase or setting policy on passphrases, the recommendation is for random multi-word phrases rather than allowing user-chosen passphrases.