Skip to main content

How it works

The Kolmogorov-Smirnov (K-S) test is a statistical method that compares two probability distributions by measuring the maximum vertical distance between their cumulative distribution functions (CDFs).
1

Character Frequency Counting

The widget first counts the frequency of each character (or n-gram) in your ciphertext. This creates an observed distribution of how often each character appears.
2

Reference Distribution

A reference distribution is selected based on your comparison mode:
  • Uniform: Equal probability for all observed characters
  • English: Standard English letter frequencies (e.g., E at 12.7%, T at 9.06%)
  • Ciphertext: The character distribution from another ciphertext you select
3

CDF Construction

Both the observed and reference distributions are converted into cumulative distribution functions. A CDF shows, for each character, the total probability of that character and all characters before it.
4

D-Statistic Calculation

The D-statistic is calculated as the maximum absolute difference between the two CDFs at any point. A smaller D-statistic indicates the distributions are more similar.
5

P-Value Calculation

The p-value represents the probability that the observed difference could occur by chance. Higher p-values suggest the distributions match well; lower p-values indicate significant differences.

Comparison Modes

Uniform Distribution

Compares your ciphertext against a uniform distribution where every character has equal probability. This is useful for detecting whether encryption has produced evenly distributed output.

English Frequencies

Compares your ciphertext against standard English letter frequencies. Only alphabetic characters (A-Z) are analyzed. This helps identify if text resembles natural English.

Another Ciphertext

Compares your ciphertext against the character distribution of a selected base ciphertext. Useful for determining if two ciphertexts were encrypted using similar methods or share statistical properties.

Display Modes

Score View

Displays a table with:
  • D-statistic: The maximum difference between CDFs (0 to 1 scale)
  • P-value: Statistical significance (color-coded for quick interpretation)
  • Interpretation: Human-readable assessment of the result
  • Sample size: Number of characters or n-grams analyzed

CDF Graph View

Displays an interactive chart showing:
  • Solid lines: Observed CDF from your ciphertext
  • Dashed lines: Expected CDF from the reference distribution
  • Dotted vertical line: Location of maximum difference (D-statistic)

N-gram Settings

N-gram Size

Instead of analyzing single characters, you can group characters into n-grams:
  • 1: Single characters (default)
  • 2: Bigrams (pairs like “TH”, “HE”)
  • 3: Trigrams (triplets like “THE”, “AND”)
Larger n-grams capture patterns in character sequences but require longer texts for meaningful analysis.

N-gram Mode

  • Sliding Window: Overlapping n-grams (ABCD → AB, BC, CD)
  • Block: Non-overlapping n-grams (ABCD → AB, CD)

P-Value Interpretation

P-value RangeInterpretation
> 0.10Distributions match well
0.05 - 0.10Slight deviation
0.01 - 0.05Significant deviation
< 0.01Very different distributions

Practical Applications

The Kolmogorov-Smirnov test can be used to:
  • Determine if ciphertext has uniform byte distribution (suggesting strong encryption)
  • Identify if plaintext resembles natural English
  • Compare multiple ciphertexts to detect similar encryption methods
  • Analyze whether a substitution cipher preserves frequency patterns

Caveats

  • English frequency comparison only analyzes alphabetic characters; non-alphabetic characters are filtered out
  • For n-gram sizes greater than 1, English frequency comparison falls back to uniform distribution (no reference English n-gram frequencies available)
  • Very short texts may produce unreliable p-values due to small sample sizes
  • The test measures overall distribution similarity, not specific character mappings