This repository is holds a simple implementation of n-ary Huffman coding in Python.
Huffman coding is a method of encoding messages with variable length, prefix-free codes that minimizes the average message length. That means that usually messages sent with Huffman codes are, on average, shorter and fairly easy to decode. More details can be found in Huffman's original paper, kept here.
Without doing much research, binary Huffman codes seem to be familiar to people in the know, but n-ary Huffman codes seem to be rarer. Wikipedia even says that "for n greater than 2, not all sets of source words can properly form an n-ary tree for Huffman coding" (2016-03-27). Where this is coming from is a bit of a mystery, because Huffman clearly gives a way to construct n-ary codes. For more about this, see "Generalization" in notes.pdf.
import huffman
import freq
example = "large corpus of text"
frequency_dict = freq.str_freq(example)
freqs = list(frequency_dict.items())
ternary_huffman = huffman.HuffmanCode(freqs, 3)
print(ternary_huffman.decode(ternary_huffman.encode(example)))
$ ./demo.py
The probabilities sum to 1.0000000000000002 (!= 1)...
(but they are close)
ascii encoding:
0100110101111001001000000110110101101001011100110111010001110010011001
0101110011011100110010011100100000011001010111100101100101011100110010
0000011000010111001001100101001000000110111001101111011101000110100001
1010010110111001100111001000000110110001101001011010110110010100100000
0111010001101000011001010010000001110011011101010110111000111011000010
1001000011011011110111001001100001011011000010000001101001011100110010
0000011001100110000101110010001000000110110101101111011100100110010100
1000000111001001100101011001000010000001110100011010000110000101101110
0010000001101000011001010111001000100000011011000110100101110000011100
1100100111001000000111001001100101011001000011101100001010010010010110
0110001000000111001101101110011011110111011100100000011000100110010100
1000000111011101101000011010010111010001100101001011000010000001110111
0110100001111001001000000111010001101000011001010110111000100000011010
0001100101011100100010000001100010011100100110010101100001011100110111
0100011100110010000001100001011100100110010100100000011001000111010101
1011100011101100001010010010010110011000100000011010000110000101101001
0111001001110011001000000110001001100101001000000111011101101001011100
1001100101011100110010110000100000011000100110110001100001011000110110
1011001000000111011101101001011100100110010101110011001000000110011101
1100100110111101110111001000000110111101101110001000000110100001100101
0111001000100000011010000110010101100001011001000010111000001010010010
0100100000011010000110000101110110011001010010000001110011011001010110
0101011011100010000001110010011011110111001101100101011100110010000001
1001000110000101101101011000010111001101101011001001110110010000101100
0010000001110010011001010110010000100000011000010110111001100100001000
0001110111011010000110100101110100011001010010110000001010010000100111
0101011101000010000001101110011011110010000001110011011101010110001101
1010000010000001110010011011110111001101100101011100110010000001110011
0110010101100101001000000100100100100000011010010110111000100000011010
0001100101011100100010000001100011011010000110010101100101011010110111
0011001110110000101001000001011011100110010000100000011010010110111000
1000000111001101101111011011010110010100100000011100000110010101110010
0110011001110101011011010110010101110011001000000110100101110011001000
0001110100011010000110010101110010011001010010000001101101011011110111
0010011001010010000001100100011001010110110001101001011001110110100001
1101000000101001010100011010000110000101101110001000000110100101101110
0010000001110100011010000110010100100000011000100111001001100101011000
0101110100011010000010000001110100011010000110000101110100001000000110
0110011100100110111101101101001000000110110101111001001000000110110101
1010010111001101110100011100100110010101110011011100110010000001110010
0110010101100101011010110111001100101110000010100100100100100000011011
0001101111011101100110010100100000011101000110111100100000011010000110
0101011000010111001000100000011010000110010101110010001000000111001101
1100000110010101100001011010110010110000100000011110010110010101110100
0010000001110111011001010110110001101100001000000100100100100000011010
1101101110011011110111011100001010010101000110100001100001011101000010
0000011011010111010101110011011010010110001100100000011010000110000101
1101000110100000100000011000010010000001100110011000010111001000100000
0110110101101111011100100110010100100000011100000110110001100101011000
0101110011011010010110111001100111001000000111001101101111011101010110
1110011001000011101100001010010010010010000001100111011100100110000101
1011100111010000100000010010010010000001101110011001010111011001100101
0111001000100000011100110110000101110111001000000110000100100000011001
1101101111011001000110010001100101011100110111001100100000011001110110
1111001110110000101001001101011110010010000001101101011010010111001101
1101000111001001100101011100110111001100101100001000000111011101101000
0110010101101110001000000111001101101000011001010010000001110111011000
0101101100011010110111001100101100001000000111010001110010011001010110
0001011001000111001100100000011011110110111000100000011101000110100001
1001010010000001100111011100100110111101110101011011100110010000111010
0000101000100000001000000010000001000001011011100110010000100000011110
0101100101011101000010110000100000011000100111100100100000011010000110
0101011000010111011001100101011011100010110000100000010010010010000001
1101000110100001101001011011100110101100100000011011010111100100100000
0110110001101111011101100110010100100000011000010111001100100000011100
1001100001011100100110010100001010001000000010000000100000010000010111
0011001000000110000101101110011110010010000001110011011010000110010100
1000000110001001100101011011000110100101100101011001000010000001110111
0110100101110100011010000010000001100110011000010110110001110011011001
0100100000011000110110111101101101011100000110000101110010011001010010
11100010000000001010
binary encoding:
1110100111001110011101110110110011100101001111001100101111110001110011
1011110000010010100110011110110101110001011011011110100010001101101011
0100100011001110001010110011001111111111101011110111110100110011110101
0100100110110001011011000011111100100101000111011110101010011001010011
1000000111000101010011110000101011101000110110101101001101110010111111
0010100111000010111101111101001011111110001100111101101011011100111010
1011001101110101101101110001110001100110111010110011100111000101011111
1000010101110100011101011010011010011001110011000001001010011001000011
1111111110101111011111010010111111100001010100101101010110000111010101
1001101111011010100111100100011001110101110110010010111001001000011011
1101101010011110000100010101011010110111001101011110000101011101000010
1011010010000111010001111101001010001010100101110101100110001101111110
0010101101011000111100001000001001110110100110010010010111111100001000
1100101001110000000100111101000000110111010110110111000111000111111101
1101001011111111110000111101101000110011111111011100010100101011010110
0011110000110001101100100101001011011110000101011101000101110001010110
1110010011001011110111110101111101111010000001011011110001100110101110
1101100100110101110101111110111111111101101111000010110110000111000101
0111010011001110111101010100110010000011110110101101000100101111001111
1010011000010101001111000101101111000111000101011001110101101001101001
1100010100111000101010011100001111110101011010111011001110111001110011
1011101101100111001010011110011000010100110111001001100111010001111101
0010100110110110101011101011001110011010000101011010010100001010111010
0011001001101011010010010010001100100111011111000011011101111011011011
0001001010010010011110110101101111111101001100001010100111000011101111
1111111001011010111000001010100111000101000100001111110010010100011101
1110101010011001001101110110011010011001011011110100010001100110101111
1111111010000101111011111010010100100010101001001111011100001001010011
1100111011101011101000110001001101110001000010001011010100001000001111
0011000010001011010101111011111011101001110011100111011101101100111001
0100111100110010001100110111010101111110001100010101100110111010011011
0100100110010001100111001010011010010000110000110101111000111000101011
0010001010101101011111111111010000100110010111110000000101111101111010
0000010011101111100100011001110101100111000101011010010111010111111010
0011001001010011100010110110111101001000011101110011100110110110101011
1010110001001100001010010010100111111100000001011111011000001001111010
0111001100010101100111010101111011010110011100000011011110110111000101
0011111100100110110110001100101110011010111011100110101001010011111010
0000111110
hex encoding:
e73eff25c7bdcce1fdefdcf9bdf867a58ebf05eddf7adfcea8e63e716b90f5cfe99bf2
6bdfbd4f7a98fadbf05e3ce1fbd4e63eee9fc861fe8df1a57decf1aeff7ad8fadbfe8b
d9c7cf9bdf4ea8e63eee9fa95bcfe8df15bdcecfe809e4edf15bdcfebb61f68fadbfad
94e23eefa9e5dfcdd8fb6cdcf4929cede14ecfbd4f984f1a57dec3e72ea7f86fceae4a
fb6cdcfcddfeef58fadbfe4addedce63e084f58fc62dfe3dbe9ea2dcf5cf7adbdf26bd
f4d05eba73e74a98f58f7adfe8bd97af7a97fe9b62f2eff25c7bdccfbddedce23eef06
e5df76fad9bfadbfce3d9edecfefd7f1d00feefed8613e74a97f2eac5e4fa97af9fe99
bf26bdfe30d9c58ebfc6ea84e63eefebb987feef8de5dbfc91f9feb644dccfeb6e63e7
3eff25c7bdccecf1ad8fcadf190edcecf7bd94cf68f7adfebb6ea84e703fffe084fefd
7ecfe8effad9e5d8ecfeef7a58edf2eff06e5df9cfb9bd3fffe0cf98effcadfe8d05d4
f157afe990cdfe462e39bde2f3
Decoding hex:
My mistress' eyes are nothing like the sun;
Coral is far more red than her lips' red;
If snow be white, why then her breasts are dun;
If hairs be wires, black wires grow on her head.
I have seen roses damask'd, red and white,
But no such roses see I in her cheeks;
And in some perfumes is there more delight
Than in the breath that from my mistress reeks.
I love to hear her speak, yet well I know
That music hath a far more pleasing sound;
I grant I never saw a goddess go;
My mistress, when she walks, treads on the ground:
And yet, by heaven, I think my love as rare
As any she belied with false compare.
Sizes relative to ascii:
ascii: 1.0
binary: 0.5426829268292683
hex: 0.1475609756097561