How we share sensitive information securely and why we have to worry about it.
When you send a letter, it is sealed in an envelope and handled by only a few people. It can be steamed open, read, re-sealed, and sent on its way, but doing so requires significant effort. When you made a telephone callin the period where telephones were wired, didn’t route calls over the Internet, and trunk lines were no longer common your telephone’s wires where physically connected via a set of switches to the wires of the other telephone. People could insert a listening device into the circuit, but only at the expense of being physically present at the right time.
With both post and wired telephones, stealing enough communications to profit off of ordinary folk typically required a physical presence and investment that could be tracked relatively easily by law enforcement. These mediums of communication were not perfect, but were secure enough for day-to-day business.
But the Internet is different.
The core operation of the Internet is the Internet Protocol, or IP. Communication via IP can be thought of as the following:
There is a big room filled with people, each with a digital cameras, a printer, and a set of pneumatic tubes. These people aren’t you or I; we’re in other rooms at the other ends of those tubes. If I want to send you a message I do something like the following:
I write a message on a piece of paper, including your name on the top
I put the message in my tube, getting it to my person in that big room
That person holds up my message, showing it to one of the other people in that room, one next to my person but closer to your person.
That person takes a picture of the message and prints out a copy; my person then can file my message or discard it, whatever seems best.
The person that took the picture prints it off and shows it to another person, who takes a picture, prints it off, and shows it to another person, who… etc.
Eventually your person sees it, takes a picture, prints off two copies and sends one via the tube to you.
There are technical reasons why this is a really good way to design a computer network, but they are not important to this post. The keys observations are that there are no envelopes to open, there are a lot of people involved in the communication, and everyone gets their own copy of my letter.
The Internet is open. By design, it is relatively easy for those in the know to get copies of almost any Internet communication they want to have, and often without doing anything they aren’t supposed to do.
How do you communicate privately when people are listening in? Any grade-school student who passes notes that the teacher might intercept and read to the class knows the answer: you write in code.
There are two problems here. First, most codes are easy to break by computer, even incredibly intricate and complex ones that are hard to break by hand. And second, the secret of the code has to be known by both sender and receiver, and how do we communicate that many secrets? I suppose I could send an online merchant a message “Hey, here’s the code we’ll use for our next letter” but how do I encode that first message so other people don’t read it?
The answer lies in a particular class of functions. In particular, two-argument functions taking a message and a “key”.
First, let’s observe that it is relatively easy to turn any message into a big number. For example, I could say A is 01, B is 02, … Z is 26, and space is 00. Stick all the digits together and “so sneaky” becomes the number 191500191405011125. This is not the encryption technique, it’s just how we’ll get away with using math instead of text-based rules.
Now let’s define a function m2 = f(m1, k). This will be both our encrypter and our decrypter, by using pairs of keys that undo each other.
Functions that have key pairs like this are commonplace. For example, simple multiplication m2 = m1 × k has that property: if you encrypt with k =
37 |
85 |
85 |
37 |
The breakthrough in public key cryptography
was the discovery of functions that are like multiplication
in that if m2 = f(m1, k1)
then m1 = f(m2, k2)—that is, there are pairs of keys that undo one another—but
for which finding one key given you know the other is very hard
“Hard” (or NP-hard) is a formal term in computing;
the most common encryption function in use today is not quite NP-hard, but are close.
Even without being NP-hard,
finding k2 given k1
with current encryption techniques
takes (on average) years of dedicated compute time.
.
However, even though finding k2 given k1 is hard,
finding a pair (k1, k2) together is easy.
With this breakthrough we can communicate securely.
The trick to secure communication is in the correct use of public keys.
I generate a key pair, k1 and k2.
I send you a message (unencrypted) saying “Let’s talk privately; use k2 to encrypt messages you send me.”
You generate a key pair, k3 and k4.
You send me a message, encrypted using the k2 I sent you, saying “OK; you should use k4 to encrypt messages you send me.”
I decrypt your message using k1, which I have never shared with anyone.
I can then send you messages encrypted with k4, which only you can decrypt because you are the only one with k3; and you can respond with messages encrypted using k2, which only I can decrypt because I am the only one with k1
Incidentally, we call keys k2 and k4 the “public keys” because we shared them with each other; keys k1 and k3 are thus the “private keys” because we don’t share them. Since every form of encryption has secrets (like our private keys), this form of encryption is named for the other kind of key: “public-key encryption”.
The above protocol is somewhat simplified from how actually secure Internet communication works. There are extra steps to prevent some third party from sending you messages as if they were me, and we sometimes use public-key encryption to send a key for a different kind of encryption and then switch to that encryption mechanism instead, but the basic idea is there.
It is possible to break encryption. The basic technique is to generate random pairs of keys in hopes of generating a pair that includes the public key you want to break. There are some tricks to speed this up a bit, but the basic approach is about that inefficient. As long as keys are big enough—and there is no intrinsic reason why they can’t be as big as we want—there is little if any hope of breaking them in any realistic time frame.
But who needs to break a really big numerical key when the key is being used to conceal a much simpler-to-guess password? And who needs to guess a password when people use the same password for every site they visit (including the fake website created by those who want to access your accounts), or emails themselves the password (given almost no email today is encrypted), or responds to email pretending something is wrong with an account and asking for the password, or the password is protected by answers to security questions that are part of public record? And who needs to even guess any of that when they can just call up the support number and say “I was locked out of my email; can you reset my account for me?”
Encryption is quite good. It might yet turn out to be less secure than we thought it was, but it is far from the weakest link in online transactions today.
Looking for comments…