How it works
1
Frequency calculation
Genericized text will first collect character frequencies of your text.
2
Character association
It will then associate the most frequent character with the letter A, the next most frequent character with the letter B, so on and so forth.
3
Character replacement
Finally, it will replace the characters accordingly in your ciphertext to now be the genericized text.
Display vs. non-display formats
CipherInspector supports many character encodings. Some are considered “display” formats, such as UTF-8, ASCII, UTF-16, and UTF-32. Others are used to store bytes of data, such as hexadecimal, binary, octal, decimal, and base64. For display formats, the genericized text is calculated by counting each character. For non-display formats, the genericized text is calculated by counting each byte.Interactions
The genericized text will only be computed after any ciphertext updates are saved.
- Ignore whitespace
- Ignore casing
- Ignore punctuation
- Reverse text
Practical Application
This can be useful when evaluating if two pieces of text with different characters share the same unique character distribution.Caveats
One major caveat is that this is only practical with an alphabet size of roughly 62, but ideally no more than 26. The reason is because we quickly run out of intuitive symbols to use. It is recommended to limit usage of genericized text to:- Hexadecimal
- Decimal
- Octal
- Base64
- UTF-8, ASCII, UTF-16, and UTF-32 strings which are limited to a unique character set between 1 and 62.