How it works
Encoding
Shannon Entropy will be different based on the character encoding of your input.For all data encodings, the data is first decoded to a
Uint8Array and then converted to
a string based on its encoding type (ASCII, Latin1, UTF-8, UTF-16, UTF-32). This step takes precedence over any other ciphertext settings,
unless the widget evaluates the ciphertext bytes rather than the encoded text data.Display encodings are
not decoded before analysis and are analyzed as-is, unless the widget is designed
to evaluate the ciphertext bytes.Ciphertext Settings
Shannon Entropy takes into consideration the settings for each ciphertext. These settings currently include:
- Ignore whitespace
- Ignore punctuation
- Ignore casing
- Genericize text
Shannon Entropy calculations
After all of the above steps are performed, the Shannon entropy calculations are executed and displayed.Read more.
Shannon Entropy formula
This widget uses the following formula for calculating shannon entropy:Periodic Shannon Entropy formula
Periodic Shannon Entropy is different based on the mode.N-gram mode: Block
For this mode, the process is:- Generate n-grams from the entire text.
- Group n-grams based on the “sample rate” setting, where sample rate is the number of n-grams in the entropy calculation.
- Calculate Shannon Entropy for each group using the basic Shannon Entropy formula.
N-gram mode: Sliding window
For this mode, the process is:- Slide a window of size ngramSize across the text and generate n-grams. This will contain overlapping n-grams.
- Follow steps 2 and 3 above.
Shannon Entropy Settings
N-grams and sliding window vs. block analysis
Shannon Entropy can be performed on n-grams wheren >= 1. For n-grams > 1, it is important
to understand the difference between sliding window and block analysis.
N-gram mode: Sliding window
Sliding window analysis “slides” across the ciphertext to create n-grams. For the textHello:
N-gram mode: Block analysis
Block analysis evaluates your n-grams as non-overlapping chunks. ForHello:
o, is not present. When using block analysis,
beware of missing data. The ciphertext length (after all toggles are applied)
must be divisible by your n-gram size for all characters to be represented in the
Shannon Entropy!
Periodic entropy vs. Table
There are two display options for Shannon Entropy.Periodic entropy (graph)
Shows a line chart, where each line is the periodic Shannon entropy of a ciphertext. The height represents the entropy, and the horizontal axis represents how far into the ciphertext the measurement is.Table
Shows a table of values, with a single entropy value for each ciphertext. This is the entropy for the entire ciphertext.Sample rate
The sample rate is used for the periodic analysis to determine how many n-grams to sample when calculating entropy. Generally, a sample rate > 16 is recommended to start seeing patterns emerge.Practical Application
Shannon entropy can be leveraged to:- Determine how close a ciphertext is to being random.
- See if a ciphertext is relatively close to English.
Caveats
- Short text may not be sufficient in measuring entropy.
- This measurement only tells one part of the story. Combine with other tools to get a full picture.

