Digital information is transfered in the form of electric currents through wires that may contain impurities that alter the currents, or electromagnetic waves through space with ambient radiation that alter the waves, yet digital information is so dependent on precision. Unlike analog information, digital information gets ruined when there is any noise involved. For example, say I shouted “Hello!” to you from far away during a windy day. The sound waves in the air constitute analog information, but the wind along the way can alter the waves and therefore perturb the information. You might hear something distorted but it would probably still resemble the original “Hello!” message. Now suppose I sent the text message “Hello!” to you via instant messaging. In decimal, the ASCII string “Hello!” (excluding the quotation marks) is 072 101 108 108 111 033, which is actually sent in bytes (blocks of 8 bits) in binary as 01001000 01100101 01101100 01101100 01101111 00100001. Now, imagine if there was some noise (perhaps radiation) in the air which altered even one of the bits. Suppose in the byte containing the letter “o”, the bits changed from 01101111 to 00101111 (the first 1 was changed to a 0). Then the message received would be “Hell/!”, a completely different message! And that’s just from one altered bit! If billions of messages are transfered every day, shouldn’t there be an overwhelming number of confusing messages all the time? And that’s just the tip of the iceberg. The messages transmitted could be in machine language telling some server to do a specific thing, and an altered bit could change its actions entirely. How come our computer-controlled devices all work properly most of the time? Maybe the radiation in the air and the imperfections in the wires that we use to transmit digital information actually don’t molest the messages we send. This is not at all true. Instead, we use error correcting codes to ensure that our messages are received as intended.

There is no way to avoid having our digital messages altered on the way to their recipients. Instead, we use an idea which we use all the time when we want to make sure our vocal messages are heard properly. The idea is to add redundancy to the message so that the receiver can check if the received message was altered or not. This is called error detecting. If, in addition, the receiver can deduce what the intended message was supposed to be, then this is called error correcting.

For example, let’s say I was shouting to you from far away during a windy day. What if I fell into a river and was shouting “Help” but you heard this as “Hello”? That would be quite an unfortunate end for me. In order to ensure that you recognize when my message was altered, we could agree beforehand that all messages should be shouted twice in a row. Therefore, if you hear two consecutive messages that are different, you can conclude that my message was altered. Therefore, if I got in trouble, I would shout “Help! Help!” If there was wind interference, you might hear something like “Hello! Help!” You would know right away that my original message was altered, so you have detected an error. However, you still cannot correct it, because you don’t know which one was my intended message. In fact, both of them could have been altered! Maybe my original message was “Hell! Hell!” What if I wanted to shout “Help! Help!” but you heard it as “Hello! Hello!”? Well, in that case I would be doomed once again, so this protocol doesn’t work perfectly. However, note that the probability that both messages are altered in the exact same way is quite small. I can make that probability however small I want by making the number of repetitions large enough.

Now, let’s talk about the idea behind error correction. The idea behind error correction is actually used all the time by our brains subconsciously whenever we listen to people speak or read poorly typed comments on YouTube. The idea is so important it deserves a special box:

Only a subset of all possible messages we see or hear make sense (or are valid), and so when we see or hear a message that doesn’t make sense, in our minds we replace it with the closest valid message.

Now we give a digital example. Let’s say I want to send messages to you one bit at a time, so a sample message could be 0 or 1. An error correcting protocol could be to triple all my messages, so that my sample message would be sent as 000 or 111. Among the 8 possible binary strings of length 3 that you could possibly receive, only the two (000 and 111) are valid. Our protocol then therefore detect up to 2 errors in any given message. If there are three errors, then 000 would become 111 and vice versa, so we wouldn’t know if there was an error. However, if we receive a message like 011, we’ll know something went wrong. Furthermore, our protocol can correct up to 1 error. If there was only 1 error, then we know the message is whatever the majority bit is. If we get the message 001, then the original message had to have been 000 if there was only 1 error. To summarize, if there are 2 or fewer errors, our system can detect it. If there is 1 error, then our system can correct it.

It is enlightening to think about error correcting geometrically. We measure distance between strings by the number of bits in which they differ, so for example 010 and 001 have a distance of 2 because they differ in the 2nd and 3rd bits. This can be easily seen by viewing the 8 binary strings of length 3 as vertices of a cube, where the strings dictate the coordinates, so 000 corresponds to the point (0,0,0) and 011 corresponds to (0,1,1):

Possible messages as vertices of cube. Only 000 and 111 are valid messages.

The distance measure is called the Hamming distance, named after Richard Hamming. The distance can be easily seen as the smallest number of steps it takes to travel between two points by traveling along the edges of the cube. Note that our two valid messages are opposite vertices of the cube and that they are a distance of 3 apart. Therefore, as long as there is 1 error (meaning we’ve strayed from a valid message by a distance of 1), then it is clear what the original message is by choosing the nearest valid codeword. In this case, since the two valid messages are a distance of 3 apart, an invalid message can only be a distance of 1 away from exactly one of the valid messages. In technical lingo, the set {000, 111} of valid messages forms the code, the individual valid messages are codewords, and the distance of the code is the smallest distance between two points in the code. In this case, there are only two points in the code, and the distance of the code is 3. In general, if the distance of the code is d, then the code can

  • detect up to d-1 errors; this is because the shortest distance between two valid codewords is d, and so traveling up to d-1 away from a valid codeword will land you on an invalid codeword, but traveling a distance d from a valid codeword could land you on another valid codeword
  • correct up to \left\lceil \frac{d}{2} \right\rceil - 1 errors (where \left\lceil \cdot \right\rceil is the ceiling function which rounds up to the nearest integer) ; this is because if you travel up to \left\lceil \frac{d}{2} \right\rceil - 1 away from a valid codeword, that codeword is still the nearest valid codeword, since the shortest distance between two valid codewords is d, but traveling a distance \frac{d}{2} from a valid codeword could land you on a midpoint between two valid codewords, in which case it is ambiguous which one is the nearest neighbor

This also explains why I chose to triple instead of double my messages. Had I chosen to double my messages instead, then my code would be {00, 11} out of the set of possible words {00, 01, 10, 11}, and the distance of my code would be 2. By the conclusion above, this code can detect up to 0 errors; that is, it’s not even an error correcting code!

In future posts, we’ll look at more complicated error correcting codes. Instead, let’s take a look at ISBN, which is an error-detecting code in practice. The 13-ISBN is a 13-digit code for books which specify information about them, such as author, publisher, language, etc. When scanning the barcode, it’s possible that the scanner reads the code erroneously. Or if a person is entering an ISBN by hand, he might make a mistake. However, ISBN is designed so that the last digit is a check digit, that is, it’s determined by the previous 12 digits. In fact, given the first 12 digits d_1, d_2, \ldots, d_{12}, the 13th digit is

d_{13} = \\ 10 - (d_1 + 3d_2 + d_3 + 3d_4 + d_5 + 3d_6 + d_7 + 3d_8 + d_9 + 3d_{10} + d_{11} + 3d_{12} \pmod{10}).

The mod 10 just means we take only the rightmost digit of the sum. This method is useful for two reasons. First, to detect an error, one can check if the sum of the digits, with alternate weights of 1 and 3, has rightmost digit 0. If not, then there is an error. Furthermore, the weights are designed so that if we transpose two adjacent digits, an error can be detected. Had we weighted all the digits evenly, then transposing two adjacent digits would not change the sum. This is also an example of an error-detecting code which is used outside of digital message transmission.

To recap, you should have learned the following from this post:

  • Why we need error detection and correction
  • The basic idea of error detection and correction
  • Geometric intuition behind error detection and correction
  • An example of error detection in a common setting