❖ Text Compression |
Problem: Efficiently encode a given string T by a smaller string E
Applications:
00001❖ ... Text Compression |
Code …
Prefix code …
❖ ... Text Compression |
Example:
T as ascii chars would require 56 bits; E needs only 16 bits
❖ ... Text Compression |
Example: T = abracadabra
Encoding with T1 is 010.11.011.010.00.010.10.010.11.011.010
Encoding with T2 is 00.10.11.00.010.00.011.00.10.11.00
The dots are there only to distinguish characters; they are not part of the encoding.
❖ ... Text Compression |
Text compression problem
Given a text T …
❖ ... Text Compression |
Huffman's algorithm
abracadabra
❖ Huffman Code |
Huffman's algorithm using a priority queue:
HuffmanCode(T):
| Input string T of size n
| Output optimal encoding tree for T
|
| compute frequency array
| Q = new priority queue (ordered low key to high key)
| for all characters c do
| T = new single-node tree storing c
| join(Q,T) with frequency(c) as key
| end for
| while |Q|≥2 do
| f1 = Q.minKey(), T1 = leave(Q)
| f2 = Q.minKey(), T2 = leave(Q)
| T = new tree node with subtrees T1 and T2
| join(Q,T) with f1+f2 as key
| end while
| return leave(Q)
❖ Decompression |
Decompression involves repeated traversal of paths in tree …
decompress(B, T): | Input bit-string B, encoding tree T | Output original string | | start from root of Tree | for each b in Bits do | if b = 1 then | go right in Tree | else | go left in Tree | end if | if reached leaf then | print char in leaf | return to root of Tree | end if | end for
❖ Analysis of Huffman Encoding |
Analysis of Huffman's algorithm:
Gives complexity: O(m + v log v) time
Many variations exist to improve compression (e.g. gzip, bzip2, xz)
Produced: 9 Aug 2020