Skip to content

This program cracks a substitution cipher using Metropolis-Hastings

Notifications You must be signed in to change notification settings

timurka43/cipher_cracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cipher_cracker

This program cracks a substitution cipher using Metropolis-Hastings Algorithms

Algorithm Idea:

The general idea of how the metropolis-hastings algorithm works in this setting is as follows: The algorithm repeatedly generates a reverse substitution cipher, deciphers the input text and evalutes to what extent the new deciphered text appears to be in proper English. The closer the text....

Defining "Proper English" with Tolstoy's War and Peace

A common way to quantify whether a string of text comes from a natural English language is to calculate the proportion of all possible character transitions in the text. (e.g the percent of transions from t to h will be relatively high as words like this, that, the are very common in english. On the other hand, transitions from k to t are very rare if at all present in English (I can't think of one). While it may appear counter-intuitive to use the translation of a Russian text to model English language's character transitions, it somehow has become a common practice to use War and Peace for this purpose. The vast volume of the text lets us use a single file, while also providing a large enough sample size of characters to model our population distribution of character transitions in English.

Potential improvements and optimizations:

Currently the program deciphers the text to the point where oftentimes where a small number of letters are still switched (e.g. all w's are replaced with t's and vice versa). While it usually works well enough so the user can understand the original message, it can be modified to almost always produce the exact original text with no substitions left. To implement this improvement, one simply needs to add multiple threads to this algorithm and then find the best result across all threads. The idea behind this is simple: the quality of "proper English" generated by a random reverse cipher can be mapped by some polynomial function with multiple local maxima and minima. As any one thread starts randomly anywhere on the function and is more likely to move in the direction of the most local critical point, any one thread will likely record that local maximum/minimum as its result, which may be good enough to understand the initial message but still contain mistakes. However, if we generate multiple threads, their starting point are will be spread out across the function's domain such that the threads will likely find all critical points. Thus, by comparing the results of multiple threads, we can almost always guarantee that we found a global minimum/maximum.

About

This program cracks a substitution cipher using Metropolis-Hastings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages