Preliminaries

Let: Some ciphers transform the text by working letter by letter. Other ciphers transform groups of letters at a time. We first discuss the shift, affine and substitution ciphers, which work letter by letter. That is, If the plaintext message is P = p_0 p_1 ... p_r, and e_k is the encryption function, then the ciphertext message is e_k(p_0) e_k(p_1) ... k_k(p_r).

Shift cipher

The plaintext is a string of characters over an alphabet of size N. Given a key k in [0,N), the encipherment rule is e(p_i)=p_i + k mod N.

Two well known examples are the Caesar Cipher, where k=3, and the Internet's popular rot-13, where k=13. Note that for a 26 character alphabet, rot-13 is involutory: encryption twice recovers the message.

The main tool for breaking this cipher is frequency counting the letters in the ciphertext. E is the most frequent letter in English, occuring about 13% of the time. The next most frequent letter is T, occuring about 9% of the time. Aligning the spikes in the observed histogram of the ciphertext will generally reveal the shift amount.

Affine cipher

The alphabets are again treated as numbers modulo some integer N. The encryption function is a general linear formula, e(p_i) = A p_i + B mod N. In order that the encryption function invert, A must be relatively prime to N.

Affine ciphers can be broken by finding the two most frequent letters and assign them to E and T respectively. You shall then have a pair of equations which can be solved using linear algebra. Repeating this with reasonable guesses at E and T should yield a solution.

Substitution cipher

Shift and affine ciphers are special cases of the substitution cipher. The substitution cipher is the choice of a general permutation of the plaintext to ciphertext alphabets. Since the permutation can be arbitrary, in order to reconstruct the permutation it is important to collect statistics on diagraphs (two letter combinations) as well as have a histogram of letter frequencies.

A bit a guess work is required to puzzle out a substitution cipher.

Vigenere cipher

A Vigenere cipher is a Shift cipher where the shift amount depends on the on the index of the letter in the plaintext. Typically, a keyword is chosen, of length l, and the plaintext broken into blocks of l letters each. If the keyword is K = k_0 k_1 .. k_(l-1), and the m-th block of plaintext is p_(m+0) p_(m+1) ... p_(m+l-1), where m is a multiple of l, the m-th block of ciphertext is e(p_(m+i))= p_(m+i) + k_i mod N.

Breaking a Vigenere cipher is a two step process. First to determine l, the length of the keyword. Second, to dissect the ciphertext into its l rows and to determine the shift of each row separately, using the methods required of a Shift cipher.

Because some letters are more common than others, such as E and T being very common, if you chose at random two letters from a text, there is a better than 1/26 chance that the letters will be the same. In fact, there is a 6.5% chance that the letters will be the same. Concerning a piece of Vigenere cipher, a correct guess of keyword length l will give this unusually high probability of letter collision when letters are randomly drawn from columns a multiple of l apart.

A graph is made of the collision probabilities observed for various l and the spikes in the graph will give away the keyword length.