BST can degenerate to LinkedList whose input values are sorted or inversely sorted
Self-Balanced Trees
| Character | Decimal | Binary |
|---|---|---|
| A | 65 | 1000001 |
| B | 66 | 1000010 |
| C | 67 | 1000011 |
| ... | ... | ... |
| X | 88 | 1011000 |
| Y | 89 | 1011001 |
| Z | 90 | 1011010 |
Standard ASCII is a fixed width encoding and each character is encoded with 7 bits.
With 7 bits, we can encode $2^7$ different code
To encode alphabet A-Z (uppercase only), the smallest number of bit needed is $\log_2 26 = 5$
Using a fixed width encoding scheme where we use $n$ bits for a character set and we want to store or transmit $m$ characters, we need $m * n$ bits for the entire file.
Huffman Coding: an algorithm that uses binary tree to compress data.
Use cases:
"SUSIE SAYS IT IS EASY"
| Frequency Table | Count (Frequency) |
|---|---|
| S | 6 |
| Space | 4 |
| I | 3 |
| A | 2 |
| E | 2 |
| Y | 2 |
| T | 1 |
| U | 1 |
| Linefeed | 1 |
Step 1
If(1) U(1) T(1) Y(2) E(2) A(2) I(3) SP(4) S(6)
Step 2
T(1) (2) Y(2) E(2) A(2) I(3) SP(4) S(6)
/ \
If(1) U(1)
Step 3
Y(2) E(2) A(2) (3) I(3) SP(4) S(6)
/ \
T(1) (2)
/ \
If(1) U(1)
Step 4
A(2) (3) I(3) (4) SP(4) S(6)
/ \ / \
T(1) (2) Y(2) E(2)
/ \
If(1) U(1)
Final Huffman Tree
Root(24)
_________|_________
0| 1|
S(6) (18)
_____|_____
0| 1|
SP(4) (14)
____|____
0| 1|
I(3) (11)
___|___
0| 1|
(5) (6)
__|__ __|__
0| 1| 0| 1|
(3) A(2) E(2) Y(2)
__|__
0| 1|
T(1) (2)
__|__
0| 1|
U(1) Lf(1)
| Character | Huffman Code | Count |
|---|---|---|
| S | 0 | 6 |
| Space | 10 | 4 |
| I | 110 | 3 |
| A | 1110 | 2 |
| E | 11110 | 2 |
| Y | 111110 | 2 |
| T | 1111110 | 1 |
| U | 11111110 | 1 |
| Linefeed | 11111111 | 1 |
| Encoding Method | Total Number of Bits |
|---|---|
| Fixed width encoding | $22 * 7 = 154$ |
| Huffman coding | $61 + 42 + 33 + 24 + 25 + 26 + 17 + 18 + 1*8 = 77$ |