Preliminaries
Let:
- P be the finite set of Plaintexts, with x a representative element.
- C be the finite set of Ciphertexts, with y a representative element.
- K be the Keyspace, with k a representative key.
- A cryptosystem assigns to each k in K an encryption rule e_k
and a decryption rule d_k leading from plain to ciphertext and from
cipher to plaintext, respectively.
Some ciphers transform the text by working letter by letter.
Other ciphers transform groups of letters at a time. We
first discuss the shift, affine and substitution ciphers,
which work letter by letter. That is, If the plaintext message
is P = p_0 p_1 ... p_r, and e_k is the encryption function,
then the ciphertext message is e_k(p_0) e_k(p_1) ... k_k(p_r).
Shift cipher
The plaintext is a string of characters over an alphabet of size N.
Given a key k in [0,N), the encipherment rule is e(p_i)=p_i + k mod N.
Two well known examples are the Caesar Cipher, where k=3, and
the Internet's popular rot-13, where k=13. Note that for a 26
character alphabet, rot-13 is involutory: encryption twice recovers
the message.
The main tool for breaking this cipher is frequency counting the
letters in the ciphertext. E is the most frequent letter in
English, occuring about 13% of the time. The next most frequent
letter is T, occuring about 9% of the time. Aligning the spikes
in the observed histogram of the ciphertext will generally reveal
the shift amount.
Affine cipher
The alphabets are again treated as numbers modulo some integer N.
The encryption function is a general linear formula,
e(p_i) = A p_i + B mod N. In order that the encryption function
invert, A must be relatively prime to N.
Affine ciphers can be broken by finding the two most frequent
letters and assign them to E and T respectively. You shall then have
a pair of equations which can be solved using linear algebra.
Repeating this with reasonable guesses at E and T should yield
a solution.
Substitution cipher
Shift and affine ciphers are special cases of the
substitution cipher.
The substitution cipher is the choice of a general permutation
of the plaintext to ciphertext alphabets. Since the permutation
can be arbitrary, in order to reconstruct the permutation
it is important to collect statistics on
diagraphs (two letter combinations) as well as have a histogram of
letter frequencies.
A bit a guess work is required to puzzle out a substitution cipher.
Vigenere cipher
A Vigenere cipher is a Shift cipher where the shift amount
depends on the on the index of the letter in the plaintext.
Typically, a keyword is chosen, of length l, and the plaintext
broken into blocks of l letters each. If the keyword is
K = k_0 k_1 .. k_(l-1), and the m-th block of plaintext is
p_(m+0) p_(m+1) ... p_(m+l-1), where m is a multiple of l,
the m-th block of ciphertext is e(p_(m+i))= p_(m+i) + k_i mod N.
Breaking a Vigenere cipher is a two step process. First to
determine l, the length of the keyword. Second, to dissect
the ciphertext into its l rows and to determine the shift of
each row separately, using the methods required of a Shift
cipher.
Because some letters are more common than others, such as
E and T being very common, if you chose at random two letters
from a text, there is a better than 1/26 chance that the letters
will be the same. In fact, there is a 6.5% chance that the
letters will be the same. Concerning a piece of Vigenere
cipher, a correct guess of keyword length l will give this
unusually high probability of letter collision when letters
are randomly drawn from columns a multiple of l apart.
A graph is made of the collision probabilities observed for
various l and the spikes in the graph will give away the
keyword length.