Understanding Huffman Coding in detail

Board Infinity

Introduction

You will discover how Huffman Coding functions in this article. Working examples of Huffman Coding in C++ are also provided. Data can be compressed using the Huffman coding method without losing any of the information. David Huffman was the one who first created it. In general, Huffman Coding is helpful for compressing data that contains frequently occurring characters.

How It Works

A character takes up 8 bits. Assuming a string contains 15 characters, a total of 8 * 15 = 120 bits are needed. By using the Huffman Coding method, we can reduce the string's size. Using the character's frequency information, Huffman coding first constructs a tree before generating code for each character. Data must be decoded after it has been encoded. Using the same tree, decoding is performed.

Using the idea of prefix code, or the rule that a code associated with a character should not appear in the prefix of any other code, Huffman coding prevents any uncertainty in the decoding process. The tree that was planted above contributes to keeping the property up.

Steps:

Two different types of encoding exist:

Fixed-Length Encoding: With this method, a fixed-length binary code is used to represent each character.
The method known as variable-length encoding uses a binary code of varying length to represent each character.

Character	Frequency	Code	Size
A	5	11	5*2 = 10
B	1	100	1*3 = 3
C	6	0	6*1 = 6
D	3	101	3*3 = 9
4 * 8 = 32 bits	15 bits		28 bits

From this example it is clear that the string had a total length of 120 bits before encoding. The size is decreased to 32 + 15 +28 = 75 after encoding.

The following steps are used to complete Huffman coding:-

Make a priority queue Q with each special character in it.
then order them according to increasing frequency.
For each of the distinct characters:
Extract the minimum value from Q, then create a new node and assign it to the left
Extract the minimum value from Q as a child of newNode, then assign it to the right
Child of newNode adds up these two minimum values and applies the result to the value of newNode.
Add this newNode to the tree and then return to the rootNode

Decoding

A portion of an encoded message like "00101001" can be decoded in a variety of ways. This can be taken in at least two different ways:

Code '0 0 10 10 01' -> aaddc
Code '0 010 10 01' ->abdc

The issue with variable-length encoding is that we cannot decode the message or text in a singular way. We must make sure that the codes assigned to every character are prefix codes in order to ensure efficient decoding.

Code

#include <iostream>
using namespace std;
#define MAX_TREE_HT 50

struct MinHNode {
unsigned freq;
char item;
struct MinHNode *left, *right;
};

struct MinH {
unsigned size;
unsigned capacity;
struct MinHNode **array;
};

struct MinHNode *newNode(char item, unsigned freq) {
struct MinHNode *temp = (struct MinHNode *)malloc(sizeof(struct MinHNode));

temp->left = temp->right = NULL;
temp->item = item;
temp->freq = freq;

return temp;
}

// Create min heap using given capacity
struct MinH *createMinH(unsigned capacity) {
struct MinH *minHeap = (struct MinH *)malloc(sizeof(struct MinH));
minHeap->size = 0;
minHeap->capacity = capacity;
minHeap->array = (struct MinHNode **)malloc(minHeap->capacity * sizeof(struct MinHNode *));
return minHeap;
}

void printArray(int arr[], int n) {
int i;
for (i = 0; i < n; ++i)
cout << arr[i];

cout << "\n";
}

void swapMinHNode(struct MinHNode **a, struct MinHNode **b) {
struct MinHNode *t = *a;
*a = *b;
*b = t;
}

void minHeapify(struct MinH *minHeap, int idx) {
int smallest = idx;
int left = 2 * idx + 1;
int right = 2 * idx + 2;

if (left < minHeap->size && minHeap->array[left]->freq < minHeap->array[smallest]->freq)
smallest = left;

if (right < minHeap->size && minHeap->array[right]->freq < minHeap->array[smallest]->freq)
smallest = right;

if (smallest != idx) {
swapMinHNode(&minHeap->array[smallest],
&minHeap->array[idx]);
minHeapify(minHeap, smallest);
}
}

write your code here: Coding Playground

Complexity Analysis

Based on their frequency, each unique character requires (nlog n) to encode. The complexity of obtaining the minimum frequency from the priority queue is O, and it occurs 2*(n-1) times (log n). Hence, complexity is generally addressed as O (nlog n) for Huffman Coding.

Understanding Huffman Coding in detail

Board Infinity

Advanced Algorithms and Problem Solving Techniques

How To Start Competitive Programming - A Complete Guide

A Quick Guide to Breadth-First Search

Depth First Search (DFS) with Explanation and Code

Difference Between BFS and DFS (with code and diagrams)

How to Perform Level Order Traversal?

A Quick Guide to Backtracking Algorithm

Solving N Queens Problem using Backtracking

Quick Guide to Divide and Conquer Algorithm

Longest Increasing Subsequence Problem

Quick Note - Greedy Programming v/s Dynamic Programming

Coin Change Problem: DP and Recursion Approach

A Definitive Guide to Knapsack Problem

How to Solve Subset Sum Problem?

Understanding Huffman Coding in detail

Understand the working of KMP Algorithm

Longest Common Substring Problem

Longest Common Subsequence problem: solved

A Quick Guide to Manacher's Algorithm

Learning About Bipartite Graphs

Graph Coloring Problem: Explained

Detect Cycle in Direct Graph

Directed Acyclic Graph: Representation

Prim's Algorithm: Explanation, Code, and Applications

Working of Kruskal's Algorithm

Prims and Kruskal algorithm for Maximum Spanning Tree

Bellman Ford Algorithm in detail with code

Floyd-Warshall Algorithm and its Implementation

Understand Travelling Salesman Problem

Branch And Bound Algorithm: Explained

How to Evaluate Postfix Expression

Introduction to Round-Robin Scheduling

Disjoint set (Union find Algorithm)

State Space Reduction in Algorithms

Apriori Algorithm

What is A* Search Algorithm?

Introduction

How It Works

Code

Complexity Analysis