BST can degenerate to LinkedList whose input values are sorted or inversely sorted
Self-Balanced Trees
Character | Decimal | Binary |
---|---|---|
A | 65 | 1000001 |
B | 66 | 1000010 |
C | 67 | 1000011 |
... | ... | ... |
X | 88 | 1011000 |
Y | 89 | 1011001 |
Z | 90 | 1011010 |
Standard ASCII is a fixed width encoding and each character is encoded with 7 bits.
With 7 bits, we can encode $2^7$ different code
To encode alphabet A-Z (uppercase only), the smallest number of bit needed is $\log_2 26 = 5$
Using a fixed width encoding scheme where we use $n$ bits for a character set and we want to store or transmit $m$ characters, we need $m * n$ bits for the entire file.
Huffman Coding: an algorithm that uses binary tree to compress data.
Use cases:
"SUSIE SAYS IT IS EASY"
Frequency Table | Count (Frequency) |
---|---|
S | 6 |
Space | 4 |
I | 3 |
A | 2 |
E | 2 |
Y | 2 |
T | 1 |
U | 1 |
Linefeed | 1 |
Step 1
If(1) U(1) T(1) Y(2) E(2) A(2) I(3) SP(4) S(6)
Step 2
T(1) (2) Y(2) E(2) A(2) I(3) SP(4) S(6)
/ \
If(1) U(1)
Step 3
Y(2) E(2) A(2) (3) I(3) SP(4) S(6)
/ \
T(1) (2)
/ \
If(1) U(1)
Step 4
A(2) (3) I(3) (4) SP(4) S(6)
/ \ / \
T(1) (2) Y(2) E(2)
/ \
If(1) U(1)
Final Huffman Tree
Root(24)
_________|_________
0| 1|
S(6) (18)
_____|_____
0| 1|
SP(4) (14)
____|____
0| 1|
I(3) (11)
___|___
0| 1|
(5) (6)
__|__ __|__
0| 1| 0| 1|
(3) A(2) E(2) Y(2)
__|__
0| 1|
T(1) (2)
__|__
0| 1|
U(1) Lf(1)
Character | Huffman Code | Count |
---|---|---|
S | 0 | 6 |
Space | 10 | 4 |
I | 110 | 3 |
A | 1110 | 2 |
E | 11110 | 2 |
Y | 111110 | 2 |
T | 1111110 | 1 |
U | 11111110 | 1 |
Linefeed | 11111111 | 1 |
Encoding Method | Total Number of Bits |
---|---|
Fixed width encoding | $22 * 7 = 154$ |
Huffman coding | $61 + 42 + 33 + 24 + 25 + 26 + 17 + 18 + 1*8 = 77$ |