Language Compression Lab

Assign a variable-length code to each English letter. The lab checks whether your codebook is ambiguous, encodes a target sentence, and scores you by total encoded length. Shorter is better, but the stream must remain uniquely decodable.

Challenge

Target locked
Only A–Z letters are encoded. Spaces and punctuation are ignored for scoring, but word breaks are shown visually.
Checking codebook…
Encoded length
code symbols in target
Average length
symbols per encoded letter
Compression vs fixed
baseline: —
Entropy floor
best possible lower bound
Huffman target
optimal prefix-code length
Session best
best valid attempt here

Encoded stream

With visual word breaks

Decoding analysis will appear here.
What the checks mean
  • Duplicate code: two letters share the same code. That is immediately ambiguous.
  • Prefix warning: one code starts with another. This may still be uniquely decodable, but decoding cannot be instant.
  • Uniquely decodable: every valid stream maps to only one letter sequence. This is the minimum requirement for the game.
  • Huffman code: the shortest prefix-free code for this target sentence and alphabet.
Export current codebook

Codebook

red = invalid/ambiguous yellow = prefix warning
Letter Code Freq. Contribution Issues