Huffman’s Algorithm

The Huffman coding scheme generates the best possible uniquely decodable code.

  • All probabilities have to be known before any codeword can be found
  • There is no formula for the length of the codewords

  • Need to have n elements such that n mod (r-1) === 1

Place-low Strategy

Place $s_{a,b}$ as low as possible

  • Combine the two codewords with the lowest probabilities
  • Sort the probabilities, placing the combined probability as low as possible
  • Repeat
  • Then label top-down $0, 1, …, n$

Then trace the tree from the paths

Place-high Strategy

Place $s_{a,b}$ has high as possible.

  • Combine the two codewords with the lowest probabilities
  • Sort the probabilities, placing the combined probability as high as possible
  • Repeat
  • Then label top-down $0, 1, …, n$

Average Length

The usual method to calculate the average length is to sum the product of each codeword’s length with its probability.

Knuth’s Theorem

Knuth’s theorem allows us to take a short-cut in finding the average length.

The average codeword length $L$ of each Huffman code is the sum of all child node probabilities / combined nodes