NAME
frequency-correlate
SYNOPSIS
frequency-correlate.py _reference_text_
DESCRIPTION
Takes stdin and creates a frequency histogram to compare against that of the
standard text, given the file name as the argument to the function. The text is
assumed to be the 26 ASCII letters. All non-letters are discarded and all letters
are changed to lower case.
The histograms of stdin and the reference are normalized and the dot product
is taken for each possible rotation of stdin. The output has the rotation
expressed as a letter (a for 0, 1 for b, etc) and will be in a [0,1] interval.
POSITIONAL ARGUMENTS
reference_text file of reference text for the reference distribution
OPTIONAL ARGUMENTS
none
HISTORY
Introduced in the 221 offering (Fall 2021–2022).
BUGS
NAME
index-of-coincidence
SYNOPSIS
index-of-coincidence.py _maximum-circulate_
DESCRIPTION
Takes stdin and and for each circulant of the text, from zero to the maxium, counts
the number of collisions between the original text and the circulant.
If the text is the character sequence s_1 s_2 ... s_n then the circulant by i
is the character sequence s_(1+i) s_(2+i) ... s_n s_1 ... s_i. A collision
between the sequences s_1 s_2 ... s_n and t_1 t_2 ... t_n are locations j
where s_j==t_j.
The output is the count of collisions for each amount of circulation, normalized
to the [0,1] interval
POSITIONAL ARGUMENTS
maximum-circulate the test is for 1 to number maximum-circulate
OPTIONAL ARGUMENTS
none
HISTORY
Introduced in the 221 offering (Fall 2021–2022).
BUGS
NAME
column-extract
SYNOPSIS
column-extract.py _offset_ _skip_
DESCRIPTION
Filters stdin ignoring all but characters passing to stdout the i-th character
from stdin, when i = offest+i*skip, for integers i.
POSITIONAL ARGUMENTS
offset The first letter to pass from stdin to stdout
skip Every letter skip places after the last letter passed to stdout
is passed to stdout.
OPTIONAL ARGUMENTS
none
HISTORY
Introduced in the 221 offering (Fall 2021–2022).
BUGS
Accept the github classroom assignment, and clone the github repo to your desktop. Add, commit, push whenever reasonable, but also before the deadline of the assignment, for the purposes of grading.
Frequency distribution exploration
The english language has a distinct and stable distribution of letters. While we consider it a random process, more naturally it is the percentage of occurrences of each letter out of 100% for all letters. Here is what it looks like.
For a simple shift cipher, this uniformity is exploited to break the cipher. The shift disguises the letters but leaves the pattern intact, only circulated (shifted on the base). To recover the shift, all circulants of the frequency distribution of the enciphered text and the one with the smallest distance from a reference is chosen.
Expressing the frequency distributions as unit length vector in ℕ26, the larger the dot product the closer the distributions. Let v(i) be the circulated by i vector from the enciphered text, and r be the standard, find i such that.
max_i < r | v(i) >
Tasks

author: burton rosenberg
created: 31 aug 2021
update: 2 sep 2023