How it works
1
Encoding
Frequency analysis will be different based on the character encoding of your input.For all data encodings, the data is first decoded to a
Uint8Array and then converted to
a string, thus producing a Latin-1 (ISO/IEC 8859-1) string (8-bit “extended” ASCII table). This
step takes precedence over any other ciphertext settings.Display encodings are
not decoded before analysis and are analyzed as-is.2
Ciphertext Settings
Frequency analysis takes into consideration the settings for each ciphertext. These settings currently include:
- Ignore whitespace
- Ignore punctuation
- Ignore casing
- Genericize text
3
Frequency Calculations
After all of the above steps are performed, the frequency calculations are executed and displayed.
Frequency Analysis Settings
N-grams and sliding window vs. block analysis
Frequency analysis can be performed on n-grams wheren >= 1. For n-grams > 1, it is important
to understand the difference between sliding window and block analysis.
Sliding window
Sliding window analysis “slides” across the ciphertext to create n-grams. For the textHello:
Block analysis
Block analysis evaluates your n-grams as non-overlapping chunks. ForHello:
o, is not present. When using block analysis,
beware of missing data. The ciphertext length (after all toggles are applied)
must be divisible by your n-gram size for all characters to be represented in the
frequency analysis!
Graph vs. Table
There are two display options for Frequency Analysis.Graph
Shows a bar chart, where each bar is an n-gram that exists in the ciphertext(s). The height represents the number of occurrences of the n-gram. Graphs also have two orientations:vertical or horizontal. The orientation dictates how
the bar chart is rendered. Either the bars are vertical or horizontal.
Table
Shows a table of values. For ciphertexts with many unique characters, or comparing many ciphertexts, this may be a better option.Count vs. Percentage
The frequency analysis y-axis can be toggled to use either a count or a percentage measurement. Note: table view displays both count and percentage.Count
The count option will chart the n-gram frequencies in terms of raw count, and how many times they appear in the ciphertext.Percentage
The percentage option will chart the n-gram frequencies in terms of raw count divided by total n-gram count.Sort
Sorting works by picking a ciphertext to sort by. You can pick an ascending or descending sort order for the selected ciphertext. All other ciphertexts being presented will be graphed wherever they end up along the dictated sort order.Practical Application
Frequency analysis can be leveraged to:- Exploit monoalphabetic substitution, or simple transposition.
- Determine if two or more ciphertexts follow similar character distribution.
- Compare n-gram frequencies to the expected frequency distribution of a given language
Caveats
- Frequency analysis typically requires a sufficiently long ciphertext to be effective.
- Polyalphabetic ciphers are typically sufficient at avoiding detection via frequency analysis.
- Plaintexts designed to skew the frequency results, such as lipograms, can mislead analysis.
- Non-standard alphabets and compressed data may result in less effective analysis.