H(K|C^n) = H(K) + H(P^n) - H(C^n)
Proof
Note that:
H(K,C^n,P^n) = H(C^n|K,P^n) + H(K,P^n) = H(P^n|K,C^n) + H(K,C^n)and that
H(C|K,P)=H(P|K,C)=0
since the encryption
is deteremined completely by the message (or ciphertext) and the key.
Further, K
and P
are chosen independently.
Therefore:
H(K,C^n,P^n) = H(K,C^n) = H(K) + H(P^n) H(K,C^n) = H(K|C^n) + H(C^n)Combining the equalities completes the proof.
The entropy H(K|C^n)
has as upper bound the
average number s_n
of spurious keys
associated with a ciphertext. That is, define:
K(y) = { k in K | exists x in P^n, p(x)>0, and e_k(x)=y }the set of keys which could possibly connect a plaintext with a given ciphertext. Its probabilistic average is
s_n
:
s_n = sum_{y in C^n} p(y)( |K(y)|-1 ) = -1 + sum_{y in C^n} p(y) |K(y)|So
H(K|C^n) = sum_y p(y)H(K|y) <= sum_y p(y) log | K(y) | <= log sum_y p(y) log |K(y)| = log ( s_n + 1 )
Definition
The entropy of a natural language L is defined as the limit
H_L = lim_{n to infinity} H(P^n) / nand the redundancy of the language as:
R_L = 1 - ( H_L / log |P| )See Stinson, the class text, for further discussion.
We make a few more assumptions and approximations. Using these approximations:
H(P^n) =(approx) n H_L = n(1-R_L)log |P|, H(C^n) <= n log |C|,we combine inequalities to yeild:
log( s_n + 1 ) >= H(K) - n R_L log |P|
Theorem (Unicity distance)
The expected number of spurious keys
in a cryptosystem where |C| = |P|
and where keys are
chosen equally likely, so that H(K) = log |K|
, is:
s_n >= ( |K| / |P|^(n R_L) ) - 1The unicity distance is that value of
n
for which the right hand side becomes zero, and is a
lower bound to the number of characters needed in order to
discover the key,
n =(approx) log |K| / ( R_L log |P| )
As an example, for the substitution cipher and english, the unicity distance is 25.