Breaking Vigenere

by: burt rosenberg
at: university of miami
date: 2 sep 2023
NAME
   frequency-correlate

SYNOPSIS
    frequency-correlate.py _reference_text_

DESCRIPTION
    Takes stdin and creates a frequency histogram to compare against that of the
    standard text, given the file name as the argument to the function. The text is
    assumed to be the 26 ASCII letters. All non-letters are discarded and all letters
    are changed to lower case.
    
    The histograms of stdin and the reference are normalized and the dot product 
    is taken for each possible rotation of stdin. The output has the rotation
    expressed as a letter (a for 0, 1 for b, etc) and will be in a [0,1] interval. 
    
POSITIONAL ARGUMENTS
    reference_text     file of reference text for the reference distribution

OPTIONAL ARGUMENTS
    none

HISTORY
    Introduced in the 221 offering (Fall 2021–2022).

BUGS

NAME
   index-of-coincidence

SYNOPSIS
    index-of-coincidence.py _maximum-circulate_

DESCRIPTION
    Takes stdin and and for each circulant of the text, from zero to the maxium, counts
    the number of collisions between the original text and the circulant.
    
    If the text is the character sequence s_1 s_2 ... s_n then the circulant by i
    is the character sequence s_(1+i) s_(2+i) ... s_n s_1 ... s_i. A collision 
    between the sequences s_1 s_2 ... s_n and t_1 t_2 ... t_n are locations j 
    where s_j==t_j.
    
    The output is the count of collisions for each amount of circulation, normalized
    to the [0,1] interval
    
POSITIONAL ARGUMENTS
    maximum-circulate     the test is for 1 to number maximum-circulate

OPTIONAL ARGUMENTS
    none

HISTORY
    Introduced in the 221 offering (Fall 2021–2022).

BUGS
NAME
    column-extract

SYNOPSIS
    column-extract.py _offset_ _skip_

DESCRIPTION
    Filters stdin ignoring all but characters passing to stdout the i-th character
    from stdin, when i = offest+i*skip, for integers i.
    
POSITIONAL ARGUMENTS
    offset   The first letter to pass from stdin to stdout
    skip     Every letter skip places after the last letter passed to stdout
             is passed to stdout.

OPTIONAL ARGUMENTS
    none

HISTORY
    Introduced in the 221 offering (Fall 2021–2022).

BUGS
Preparation

Accept the github classroom assignment, and clone the github repo to your desktop. Add, commit, push whenever reasonable, but also before the deadline of the assignment, for the purposes of grading.

Frequency distribution exploration

The english language has a distinct and stable distribution of letters. While we consider it a random process, more naturally it is the percentage of occurrences of each letter out of 100% for all letters. Here is what it looks like.

For a simple shift cipher, this uniformity is exploited to break the cipher. The shift disguises the letters but leaves the pattern intact, only circulated (shifted on the base). To recover the shift, all circulants of the frequency distribution of the enciphered text and the one with the smallest distance from a reference is chosen.

Expressing the frequency distributions as unit length vector in ℕ26, the larger the dot product the closer the distributions. Let v(i) be the circulated by i vector from the enciphered text, and r be the standard, find i such that.

      max_i < r | v(i) >

Tasks

  1. Complete the code in the three python functions.
  2. Break the first challenge text, challenge-1.txt.
  3. Break the second challenge text, challenge-2.txt.
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 31 aug 2021
update: 2 sep 2023