There are k! substitution ciphers for an alphabet with k letters—too many for an exhaustive search. With a frequency-based approach adapted to the graph of alphabetic ciphers, we redefine the act of deciphering as a sampling problem suitable for a Metropolis-Hastings random walk. A substitution cipher is thus solvable with a Markov chain.
Let’s begin with a review of some basic cryptography: a simple substitution cipher replaces letters one-for-one in the target language. There are thus possible English substitution ciphers.
Brute force
With the set of all ciphers
we shuffle through
One way of reassignment is by rotation as with the Caesar cipher.
For a brute force tactic the objective is to overwhelm the solution space. With
Assign keys to letters—evaluate, and repeat until a pattern emerges from the yet-to-be decrypted text.
Scoring a Key
We can evaluate a cipherkey’s fitness by comparing the distributions of letter and letter-to-letter frequencies. If we swap two letter assignments, the change in fitness is measured as a difference between corresponding frequency tensors.
To do so efficiently, we swap the corresponding row/column pairs and recompute only the impacted entries.
First order frequency matrix
The first-order frequency tensor is a matrix. For the letter ‘a’ it measures the distribution of letters ‘a’–’z’ that come after it. This is equivalent to counting occurrences of digrams ‘aa,’ ‘ab,’ … , ‘az’ in the cipher text.
The evaluated ciphertext is compared against a baseline: assuming we’re tracking the first-order letter-to-letter frequencies, we obtain the frequency tensor by counting the digrams in a known English text. By comparing the digrams, we form a statistical foundation to solve the substitution cipher by Markov chain. A score is obtained as the cumulative difference between entries in both tensors.
Statistical Objective Function
By comparing scores, obtained from values representing frequencies, we construct the
for the first-order frequencies
The solution to the cipher is the
Global Optimization
In general, the solution key to a ciphertext provides a global optimum to the score function
Instead we must allow our attempted solution to degrade at times—balancing exploration of the solution space with optimization of the score.
Bottlenecks
Solving a cipher one letter-swap at a time, 25/26 letters of the alphabet must be nailed down before we’re able to identify all 26.
In order to get to the appropriate solution, we must pass through certain regions, known as bottlenecks.
A strict optimizer has a hard time passing through the bottleneck. In scoring text, ehllo
One way to do this is with a Metropolis-Hastings (MH) algorithm.
Sampling with Metropolis-Hastings
Suppose we want to sample from the stationary i.e. asymptotic distribution
From the Markov balance equation
define the probability
Here,
We will apply the Metropolis-Hastings algorithm to sample from the set of cipherkeys with good score. In particular, we’d like to compute the unique sample argmin
Proposing a Key
Denoting
accepting any new keys with relative probabilities
where
Controlling the Optimization Rate
For something beyond brute force, however, we can use an Ising model. The Ising model assigns each key proposal a probability depending on differences in score
and a “temperature”
This produces a Metropolis-Hastings random optimizer that we can control through manipulating the Ising parameter
The temperature