Hashing It Out: What You Need to Know About Passwords

In my last article, I mentioned that the best way to protect yourself online is to use a good password manager and unique passwords for all your websites. However, I didn’t elaborate much on exactly why that’s the case. As such, this time on I Promise I’ll Come Up With a Name for This Column Eventually, I’ll be covering the good, the bad, and the ugly of how we log in to websites.

There’s a myriad of ways that you can secure your online accounts, but we’ll start with arguably the most common: the humble password. The concept of a password is simple: a string of characters and words known only to you, which the website in question checks against its database. However, passwords are actually deceptively complicated in two major ways: how to define a “strong” password, and how the website actually checks that you have the right one.

1: Creating the Password

A “strong” password is, in short, one that is hard to guess. But how do you define “hard to guess”? You could define it strictly by length: after all, the more characters you have in the password, the harder it is to pick the right sequence. However, length doesn’t tell the whole story: “swarthmorecollege” and “y2oUcgM>$c2Y,8Z4p” both have seventeen characters, but the former is a much weaker password (especially if you’re using it for your Swarthmore account!) The best way to define “password strength” is in terms of password patterns; for instance, “uncommon dictionary word with three numbers at the end,” or “four random common words,” or “eight characters of pure random gibberish.” Then you can calculate how many passwords you can generate from those patterns. As an example, if you had a password that was just four numbers (like, say, your smartphone passcode), you would have 10 x 10 x 10 x 10 = 10,000 possible passwords. Working in much the same way, we can calculate that for those three patterns mentioned earlier, there are 171 million (uncommon word with three numbers), 16 trillion (four words), and 6 quadrillion (gibberish) possible passwords.

You might notice in that last paragraph that the random-common-words password has a lot more possible combinations than the dictionary word with numbers and characters, which might seem counterintuitive because “take a simple word and add lots of crap to it” is the recipe that many websites make you use to satisfy their password requirements. This is because those password requirements are based on a 2003 government publication (which was itself mostly based on a random report written in the 1980s) that the author later admitted was completely incorrect. To paraphrase the linked article, these strange password requirements didn’t take into account the fact that the requirements are far too onerous and simply lead to users circumventing the security measures (writing passwords on post-its and the like). As such, when not using a website that has atrocious password requirements, you should follow the new government publication, which recommends long passwords composed mostly of dictionary words. This is my personal favorite password generator, which creates passwords like “Machinery-Sock-Party-Active-4” or “Deer-Long-Intention-Complication-0.” The numbers don’t really make anything much more secure but are useful to satisfy websites with stupid password demands. (Here’s looking at you, Swarthmore login!) Alternatively, you can use the password generator embedded in your password manager. Did I mention that you should be using a password manager? Because you should be using a password manager.

2: Storing the Password

Password manager notwithstanding, we’ve covered one half of password-based authentication: the creation of the password. Now, let’s move onto the other half: what exactly happens to that password after you type it in? The answer is more complicated than you might think and can take a variety of different forms depending on how good the website’s designers are at their job.

There are a number of different ways to build a password authentication system. The simplest and most obvious way is to keep a big database of everyone’s usernames and associated passwords. Whenever anyone tries to log in, just look up their username and compare the passwords. This is a very easy approach, which is why it’s in use by a lot of old websites that don’t know any better. The fundamental problem with storing passwords like this — in plain text, accessible to anyone who can read the database — is that if a hacker breaks in, it’s trivially easy for them to steal every single username and associated password and try them out wherever they please. Depending on how terrible the broader website security is, a hacker might even be able to trick the website into accidentally handing them a bunch of passwords by exploiting a software glitch. (This happened to “social application site” RockYou in 2009, resulting in the leak of 30 million usernames and passwords.) A website actually knowing a user’s password is a pretty massive security flaw, which is why if you ever receive an email from a website that contains your actual password then you shouldn’t entrust that site with your data unless you have to.

But wait: how is a website supposed to verify your password if they don’t store your password? The answer lies in a lovely bit of mathematical wizardry known as a hash function. A hash function takes in arbitrary data and outputs complete gibberish. However, there are two critical things about hash functions that make them very useful in password storage: they always produce the same output for a given input, and it is effectively impossible to reconstruct the input from the output. Programmers can use this knowledge to build a more secure password system. Whenever someone sets their password, the system runs it through the hash function and then stores that in their database before purging the original password from memory. When the user returns, the website hashes whatever they enter and compares it to the entry in the database. That way, the server has absolutely no idea what their password is, and even if its entire database is stolen by hackers, they won’t be able to actually use anything in it.

Except it’s not quite that easy, and we’re missing one final step. The great thing about a hash function is it always provides the same output for a given input. The downfall of a hash function is likewise that it always provides the same output for a given input. Since most passwords are fairly common strings (“aaaaaaa,” “password,” and so on), a hacker can just hash a bunch of those in advance (or use publicly available tables) and search through their stolen database for any hashes that match. The way to deter this is through the use of a salt: a chunk of random data that’s appended onto the end of your password before it’s fed into the hash function. Putting it another way, it’s pretty likely that someone has pre-calculated the hash of “password,” but much less likely that there’s a premade hash for “password.NxnU7B@8kxR.” The random data is then stored next to the hash result so that when verification is needed, the system adds the random data onto the provided password and compares that with the existing hash.

3: Beyond the Password

A well-protected password (which is sadly a pretty rare sight, especially in old systems) is very, very difficult to crack with current technology. At least, it’s pretty difficult to get out of the database of the company that stores it (or its hash, as it were). But if an attacker manages to get the password — whether by exploiting some mistake the company made, hacking a poorly secured website where you used the same password, or tricking you into handing it over through a phishing scheme — then all this security is for naught. So, if we can’t even assume that a user who presents the correct username and password isn’t a nefarious actor, where do we go from here?

The answer is via multi-factor authentication (MFA). A “factor” is anything you can use to prove your identity, and they fall into three distinct categories: “something you have,” such as a smartphone; “something you know,” such as a password; and “something you are,” which covers biometric data like facial or fingerprint recognition. MFA is the simultaneous use of methods in multiple categories. This is a superset of two-factor authentication (2FA), which most commonly refers to a password as “something you know” and a smartphone that receives a text, generates a code, or (in the Duo app we all know and love) sends you a notification as “something you have.” But 2FA can be more diverse than that: consider swiping your credit card (something you have) and writing a signature (the muscle memory being, effectively, something you are). Note that a process that resets your password by sending your phone an authentication code isn’t a two-factor authentication system, because you don’t actually need to know the password, you just need to have the phone.

The specific breed of 2FA used by plenty of websites works because it’s considerably less likely that a hacker thousands of miles away has access to your phone. But how does your phone generate the right code? Well, two-factor authentication codes that are received via a text message are kind of boring (the server just generates some random numbers and sends them to you), and are also somewhat vulnerable to attack. The more interesting kinds are the ones generated by an app such as Google Authenticator. (The Duo Mobile app is a similar concept, but it’s distinct from what I’m talking about here.) These are generated using a method known as Time-based One-Time Password. While it most commonly runs on your phone, it can also run on a laptop or desktop or even a dedicated keychain device that does nothing but generate passwords. It doesn’t even require an internet connection. All it needs is the secret key provided by the website when you first set up TOTP, which was transferred to your phone when you scanned the QR code provided by the website, as well as a rough idea of what time it is. Both the server and your TOTP device use the shared secret key to encrypt a small chunk of data based on the current time, and then the server checks what you entered against what it thinks the result should be. The code is changed every thirty seconds, though the server will accept a code from a minute or two in either direction to compensate for devices’ clocks getting out of sync.

You should use some form of 2FA on any website which supports it, which nowadays is most major websites (including Google, Instagram, PayPal, and the like). But remember that just like any authentication method, nothing will make your account hack-proof. In fact, there’s absolutely no way to make a hack-proof account without making it impossible for you to log in! The best you can do is have enough layers of security to make breaking into your accounts as difficult as possible, while still keeping things simple for your day-to-day life. And that idea brings us back to where this column ostensibly began: password management.

4: Password Managers

A password manager is, in the broadest sense, something that stores your passwords outside of your brain. Have all your passwords in a text file somewhere? Technically, that’s a password manager! (It’s a terrible idea, though.) However, when I say “you should get a password manager,” I specifically mean an online password manager. These sync your passwords across all of your devices (including smartphones) and tend to come with a multitude of other features such as password generation or document storage. Any competent password manager will also include end-to-end encryption, which means that your passwords are encrypted before they are sent to the server, and their online service essentially acts like an overgrown Google Drive for the chunks of password data. The upshot of this is that even if the company were to be breached, the attackers would just end up with a lot of useless gibberish sealed in the magical envelopes from the last article. The only way to get your passwords is to decrypt them using your master password.

The biggest way a password manager makes you more secure is lowering the blast radius of a data breach. I have well over a hundred different online accounts — if any one of those gets hacked, and wasn’t taking the security measures I discussed above, then any other account I used with the same password is now at risk. Obviously, I cannot remember 200 different passwords, but my password manager can! And instead of using memorable passwords, I can use passwords that are as strong as my heart desires. Furthermore, while Google Chrome and Apple have their own functions for auto-filling of passwords, using a third-party password manager lets you sync seamlessly across all major browsers and operating systems. While it might take you an hour or two to transfer all your passwords into the password manager for the first time, after that’s done, it makes the sign-up process for a website about thirty seconds longer in exchange for letting you sign in instantly every subsequent time.

I’ve been using a password manager — specifically, 1Password — since 2018. Their individual plan will run you $36 a year, or it costs my family about $60 per year for a family plan which includes five accounts and the ability to share certain passwords (like your home wifi) between members. There are good password managers out there for less or even for free (like BitWarden), but I haven’t used any of them, so I’m not qualified to make a recommendation — for more info on those, check out the links below.

If you’re looking for a password manager, 1Password will serve you well, but I haven’t done nearly enough research to be able to endorse it over any others. I recommend you consult reviews (PCMag, CNET, New York Times) and make an informed decision based on your particular use case and price point.

If you’ve managed to stick with me for this long, then thanks for reading! You’ve hopefully learned a thing or two about the surprisingly in-depth world of password and account security. Of course, one article barely scratches the surface of the topic; I didn’t cover how those “log in with Google/Apple/Facebook” buttons work, or how you can log into Moodle by entering your Swarthmore credentials (without Moodle ever knowing your password), or differences in hashing algorithms, or a million other things. However, I find this sort of thing absolutely fascinating, and I hope that I’ve managed to get you to feel much the same way. Now go use your newfound knowledge to go get a password manager.

Some final notes:

The discussion of password strength is mostly based on that one xkcd comic and the associated commentary wiki. The possible password combinations were calculated as follows: “uncommon word” is one of the 171,476 words in current usage according to the Oxford English Dictionary, “common words” assumes a list of 2,000 words, and “random gibberish” is a random assortment of the 94 printable ASCII characters (not including the space).
There’s actually even more about password storage that I didn’t cover (like peppering and work factors). If you want to learn more about those, here’s a fairly accessible explanation.
For more info about password manager security, 1Password has released a whitepaper which does an exhaustive dive into all the cryptography and security mechanisms present in their product. It’s well-written and provides a great overview of secure design that remains user-friendly, but it’s not exactly for the faint of heart. LastPass and Dashlane also have similar whitepapers.

If you have any further questions, would like to see a column on a specific topic, or think that I got something wrong, feel free to email me at zrobins2@swarthmore.edu. You can also DM me on Instagram @software.dude.