How it works
1
Encoding
Index of coincidence (abbreviated as “IC”) will be different based on the character encoding of your input.For all data encodings, the data is first decoded to a
Uint8Array and then converted to
a string, thus producing a Latin-1 (ISO/IEC 8859-1) string (8-bit “extended” ASCII table). This
step takes precedence over any other ciphertext settings.Display encodings are
not decoded before analysis and are analyzed as-is.2
Ciphertext Settings
IC takes into consideration the settings for each ciphertext. These settings currently include:
- Ignore whitespace
- Ignore punctuation
- Ignore casing
- Genericize text
3
IC Calculations
After all of the above steps are performed, the IC calculations are executed and displayed.Where:
IC formula
This widget uses the following formula for calculating index of coincidence:n_i= frequency of n-gram i (or character i when n-gram size = 1)N= total number of n-grams (or characters)Σ= sum over all unique n-grams/characters
Periodic IC formula
Periodic IC is different based on the mode.N-gram mode: Block
For this mode, the process is:- Generate n-grams from the entire text
- Group n-grams by their index modulo the period:
- Calculate IC for each group using the basic IC formula
- Average the IC values across all groups:
N-gram mode: Sliding window
For this mode, the process is:- Slide a window of size ngramSize across the text
- For each window:
- Generate n-grams within the window
- Group by index modulo period (same as block mode)
- Calculate and average group ICs
- Average the window ICs:
Index of Coincidence Settings
N-grams and sliding window vs. block analysis
IC can be performed on n-grams wheren >= 1. For n-grams > 1, it is important
to understand the difference between sliding window and block analysis.
N-gram mode: Sliding window
Sliding window analysis “slides” across the ciphertext to create n-grams. For the textHello:
N-gram mode: Block analysis
Block analysis evaluates your n-grams as non-overlapping chunks. ForHello:
o, is not present. When using block analysis,
beware of missing data. The ciphertext length (after all toggles are applied)
must be divisible by your n-gram size for all characters to be represented in the
IC!
Graph vs. Table
There are two display options for IC.Graph (Periodic analysis)
Shows a line chart, where each line is the periodic IC of a ciphertext. The height represents the IC, and the horizontal axis represents how far into the ciphertext the measurement is.Table
Shows a table of values, with a single IC value for each ciphertext. This is the IC for the entire ciphertext.Max Period
The max period is used in the graph to determine how far the x-axis will go. This helps with scaling the data in case characters are of varying lengths.Show Average Lines
In the periodic analysis graph, shows a dotted line for each ciphertext indicating the average line.Practical Application
IC can be leveraged to:- Determine if a cipher is periodic or aperiodic.
- Discover key lengths of periodic ciphers, such as Vigenere with a repeat key.
- Compare the periods of two or more ciphers.
Caveats
- Noisy results may mislead you.
- Autokey is a sufficient way to get rid of periodic spikes in IC.
- Coincidences in sufficiently short text may also mislead you.
- Some languages have similar IC patterns. If you don’t know the language of the plaintext, you may be misled.