霍夫曼编码算法

2025年3月17日 | 阅读13分钟

可以使用霍夫曼编码技术来压缩数据，使其变小而不丢失任何信息。它是大卫·霍夫曼在最初创造它之后。包含频繁重复字符的数据通常使用霍夫曼编码进行压缩。

霍夫曼编码是一种著名的贪心算法。分配给字符的代码大小取决于字符的频率，这就是它被称为贪心算法的原因。为频率最高的字符分配短长度的变长代码，反之亦然。它使用变长编码，这意味着它为提供的比特流中的每个字符分配不同的变长代码。

前缀规则

本质上，这条规则规定，分配给字符的代码不能是另一个代码的前缀。如果违反了这条规则，在解码创建的霍夫曼树时可能会出现各种歧义。

让我们通过一个例子来更好地理解这条规则：为每个字符分配一个代码，例如

    a - 0
    b - 1
    c - 01

假设生成的比特流是001，代码可以如下解码

    0 0 1 = aab
    0 01  = ac

什么是霍夫曼编码过程？

霍夫曼码是在主要两个步骤中为数据流中的每个不同字符获得的

首先，使用数据流中仅有的不同字符创建一个霍夫曼树。
其次，我们必须遍历创建的霍夫曼树，为字符分配代码，然后使用这些代码来解码给定的文本。

霍夫曼编码的步骤

用于使用提供的字符构建霍夫曼树的步骤

Input:
string str = "abbcdbccdaabbeeebeab"

如果在此情况下使用霍夫曼编码进行数据压缩，则必须确定以下信息才能进行解码

每个字符的霍夫曼码
霍夫曼编码消息的长度（以比特为单位），平均代码长度
利用下面介绍的公式，可以发现后两者。

如何从输入字符构建霍夫曼树？

首先必须确定数据流中每个字符的频率。

Character	频率
a	4
b	7
c	3
d	2
e	4

按频率升序对字符进行排序。这些保存在 Q/min-heap 优先级队列中。
为数据流中的每个不同字符及其频率创建一个叶节点。
从节点中移除两个频率最低的节点，并通过将这两个频率相加来创建一个新的根节点。
- 在移除最小频率的节点时，将第一个移除的节点作为其左子节点，第二个移除的节点作为其右子节点。
- 将此节点添加到最小堆中。
- 因为根的左侧应该总是包含最小频率。
重复步骤 3 和 4，直到堆中只剩一个节点，或者所有字符都由树中的节点表示。当只剩下根节点时，树就完成了。

霍夫曼编码示例

让我们用一个例子来解释算法

霍夫曼编码算法

步骤 1：构建一个最小堆，其中每个节点代表一个具有单个节点的树的根，并保存 5（来自给定数据流的不同字符数）。

步骤 2：在步骤二中，从最小堆中获取两个最小频率的节点。添加一个第三个内部节点，频率为 2 + 3 = 5，该节点是通过连接两个提取的节点创建的。

现在，最小堆中有 4 个节点，其中 3 个是具有单个元素的树的根，1 个是具有两个元素的树的根。

步骤 3：在步骤三中，以类似的方式从堆中获取两个最小频率的节点。此外，添加一个由连接这两个提取节点形成的新内部节点；其在树中的频率应为 4 + 4 = 8。

现在最小堆中有三个节点，一个节点是单个元素的树的根，两个堆节点是具有多个元素的树的根。

步骤 4：在步骤四中，获取两个最小频率的节点。此外，添加一个由连接这两个提取节点形成的新内部节点；其在树中的频率应为 5 + 7 = 12。

在创建霍夫曼树时，我们必须确保最小值总是在左侧，第二个值总是在右侧。目前，下图显示了形成的树

步骤 5：在步骤 5 中，获取接下来的两个最小频率的节点。此外，添加一个由连接这两个提取节点形成的新内部节点；其在树中的频率应为 12 + 8 = 20。

继续，直到所有不同字符都已添加到树中。上图显示了为指定角色创建的霍夫曼树。

现在，对于每个非叶节点，为左边缘分配 0，为右边缘分配 1，以创建每个字母的代码。

确定边权重的规则

如果你给左边缘赋予权重 0，那么我们也应该给右边缘赋予权重 1。
如果左边缘赋予权重 1，则右边缘必须赋予权重 0。
上述两种约定中的任何一种都可以使用。
然而，在解码树时也应遵循相同的协议。

遵循权重分配后，修改后的树显示如下

理解代码

为了从生成的霍夫曼树中解码每个字符的霍夫曼码，我们必须遍历霍夫曼树直到到达叶节点，那里存在该字符。
必须在遍历期间记录节点之间的权重，并将它们分配给位于特定叶节点中的项。
下面的例子将有助于进一步说明我们的意思
要获得上图中每个字符的代码，我们必须遍历整个树（直到覆盖所有叶节点）。
因此，创建的树用于解码每个节点的代码。下表是每个字符的代码列表

Character	频率/计数	代码
a	4	01
b	7	11
c	3	101
d	2	100
e	4	00

下面是 C 语言的实现

// C program for Huffman Coding
#include <stdio.h>
#include <stdlib.h>

// This constant can be avoided by explicitly
// calculating height of Huffman Tree
#define MAX_TREE_HT 100

// A Huffman tree node
struct MinHeapNode {

	// One of the input characters
	char data;

	// Frequency of the character
	unsigned freq;

	// Left and right child of this node
	struct MinHeapNode *left, *right;
};

// A Min Heap: Collection of
// min-heap (or Huffman tree) nodes
struct MinHeap {

	// Current size of min heap
	unsigned size;

	// capacity of min heap
	unsigned capacity;

	// Array of minheap node pointers
	struct MinHeapNode** array;
};

// A utility function allocate a new
// min heap node with given character
// and frequency of the character
struct MinHeapNode* newNode(char data, unsigned freq)
{
	struct MinHeapNode* temp = (struct MinHeapNode*)malloc(
		sizeof(struct MinHeapNode));

	temp->left = temp->right = NULL;
	temp->data = data;
	temp->freq = freq;

	return temp;
}

// A utility function to create
// a min heap of given capacity
struct MinHeap* createMinHeap(unsigned capacity)

{

	struct MinHeap* minHeap
		= (struct MinHeap*)malloc(sizeof(struct MinHeap));

	// current size is 0
	minHeap->size = 0;

	minHeap->capacity = capacity;

	minHeap->array = (struct MinHeapNode**)malloc(
		minHeap->capacity * sizeof(struct MinHeapNode*));
	return minHeap;
}

// A utility function to
// swap two min heap nodes
void swapMinHeapNode(struct MinHeapNode** a,
					struct MinHeapNode** b)

{

	struct MinHeapNode* t = *a;
	*a = *b;
	*b = t;
}

// The standard minHeapify function.
void minHeapify(struct MinHeap* minHeap, int idx)

{

	int smallest = idx;
	int left = 2 * idx + 1;
	int right = 2 * idx + 2;

	if (left < minHeap->size
		&& minHeap->array[left]->freq
			< minHeap->array[smallest]->freq)
		smallest = left;

	if (right < minHeap->size
		&& minHeap->array[right]->freq
			< minHeap->array[smallest]->freq)
		smallest = right;

	if (smallest != idx) {
		swapMinHeapNode(&minHeap->array[smallest],
						&minHeap->array[idx]);
		minHeapify(minHeap, smallest);
	}
}

// A utility function to check
// if size of heap is 1 or not
int isSizeOne(struct MinHeap* minHeap)
{

	return (minHeap->size == 1);
}

// A standard function to extract
// minimum value node from heap
struct MinHeapNode* extractMin(struct MinHeap* minHeap)

{

	struct MinHeapNode* temp = minHeap->array[0];
	minHeap->array[0] = minHeap->array[minHeap->size - 1];

	--minHeap->size;
	minHeapify(minHeap, 0);

	return temp;
}

// A utility function to insert
// a new node to Min Heap
void insertMinHeap(struct MinHeap* minHeap,
				struct MinHeapNode* minHeapNode)

{

	++minHeap->size;
	int i = minHeap->size - 1;

	while (i
		&& minHeapNode->freq
				< minHeap->array[(i - 1) / 2]->freq) {

		minHeap->array[i] = minHeap->array[(i - 1) / 2];
		i = (i - 1) / 2;
	}

	minHeap->array[i] = minHeapNode;
}

// A standard function to build min heap
void buildMinHeap(struct MinHeap* minHeap)

{

	int n = minHeap->size - 1;
	int i;

	for (i = (n - 1) / 2; i >= 0; --i)
		minHeapify(minHeap, i);
}

// A utility function to print an array of size n
void printArr(int arr[], int n)
{
	int i;
	for (i = 0; i < n; ++i)
		printf("%d", arr[i]);

	printf("\n");
}

// Utility function to check if this node is leaf
int isLeaf(struct MinHeapNode* root)

{

	return !(root->left) && !(root->right);
}

// Creates a min heap of capacity
// equal to size and inserts all character of
// data[] in min heap. Initially size of
// min heap is equal to capacity
struct MinHeap* createAndBuildMinHeap(char data[],
									int freq[], int size)

{

	struct MinHeap* minHeap = createMinHeap(size);

	for (int i = 0; i < size; ++i)
		minHeap->array[i] = newNode(data[i], freq[i]);

	minHeap->size = size;
	buildMinHeap(minHeap);

	return minHeap;
}

// The main function that builds Huffman tree
struct MinHeapNode* buildHuffmanTree(char data[],
									int freq[], int size)

{
	struct MinHeapNode *left, *right, *top;

	// Step 1: Create a min heap of capacity
	// equal to size. Initially, there are
	// modes equal to size.
	struct MinHeap* minHeap
		= createAndBuildMinHeap(data, freq, size);

	// Iterate while size of heap doesn't become 1
	while (!isSizeOne(minHeap)) {

		// Step 2: Extract the two minimum
		// freq items from min heap
		left = extractMin(minHeap);
		right = extractMin(minHeap);

		// Step 3: Create a new internal
		// node with frequency equal to the
		// sum of the two nodes frequencies.
		// Make the two extracted node as
		// left and right children of this new node.
		// Add this node to the min heap
		// '$' is a special value for internal nodes, not
		// used
		top = newNode('$', left->freq + right->freq);

		top->left = left;
		top->right = right;

		insertMinHeap(minHeap, top);
	}

	// Step 4: The remaining node is the
	// root node and the tree is complete.
	return extractMin(minHeap);
}

// Prints huffman codes from the root of Huffman Tree.
// It uses arr[] to store codes
void printCodes(struct MinHeapNode* root, int arr[],
				int top)

{

	// Assign 0 to left edge and recur
	if (root->left) {

		arr[top] = 0;
		printCodes(root->left, arr, top + 1);
	}

	// Assign 1 to right edge and recur
	if (root->right) {

		arr[top] = 1;
		printCodes(root->right, arr, top + 1);
	}

	// If this is a leaf node, then
	// it contains one of the input
	// characters, print the character
	// and its code from arr[]
	if (isLeaf(root)) {

		printf("%c: ", root->data);
		printArr(arr, top);
	}
}

// The main function that builds a
// Huffman Tree and print codes by traversing
// the built Huffman Tree
void HuffmanCodes(char data[], int freq[], int size)

{
	// Construct Huffman Tree
	struct MinHeapNode* root
		= buildHuffmanTree(data, freq, size);

	// Print Huffman codes using
	// the Huffman tree built above
	int arr[MAX_TREE_HT], top = 0;

	printCodes(root, arr, top);
}
// Driver code
int main()
{
	char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
	int freq[] = { 5, 9, 12, 13, 16, 45 };
	int size = sizeof(arr) / sizeof(arr[0]);
	HuffmanCodes(arr, freq, size);
	return 0;
}

输出

f: 0
c: 100
d: 101
a: 1100
b: 1101
e: 111
……………
Process executed in 1.11 seconds
Press any key to continue.

以上代码的 Java 实现

import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Scanner;

class Huffman {

	// recursive function to print the
	// huffman-code through the tree traversal.
	// Here s is the huffman - code generated.
	public static void printCode(HuffmanNode root, String s)
	{

		// base case; if the left and right are null
		// then its a leaf node and we print
		// the code s generated by traversing the tree.
		if (root.left == null && root.right == null
			&& Character.isLetter(root.c)) {

			// c is the character in the node
			System.out.println(root.c + ":" + s);

			return;
		}

		// if we go to left then add "0" to the code.
		// if we go to the right add"1" to the code.

		// recursive calls for left and
		// right sub-tree of the generated tree.
		printCode(root.left, s + "0");
		printCode(root.right, s + "1");
	}

	// main function
	public static void main(String[] args)
	{
		Scanner s = new Scanner(System.in);
		// number of characters.
		int n = 6;
		char[] charArray = { 'a', 'b', 'c', 'd', 'e', 'f' };
		int[] charfreq = { 5, 9, 12, 13, 16, 45 };

		// creating a priority queue q.
		// makes a min-priority queue(min-heap).
		PriorityQueue<HuffmanNode> q
			= new PriorityQueue<HuffmanNode>(
				n, new MyComparator());

		for (int i = 0; i < n; i++) {

			// creating a Huffman node object
			// and add it to the priority queue.
			HuffmanNode hn = new HuffmanNode();

			hn.c = charArray[i];
			hn.data = charfreq[i];

			hn.left = null;
			hn.right = null;

			// add functions adds
			// the huffman node to the queue.
			q.add(hn);
		}

		// create a root node
		HuffmanNode root = null;

		// Here we will extract the two minimum value
		// from the heap each time until
		// its size reduces to 1, extract until
		// all the nodes are extracted.
		while (q.size() > 1) {

			// first min extract.
			HuffmanNode x = q.peek();
			q.poll();

			// second min extract.
			HuffmanNode y = q.peek();
			q.poll();

			// new node f which is equal
			HuffmanNode f = new HuffmanNode();

			// to the sum of the frequency of the two nodes
			// assigning values to the f node.
			f.data = x.data + y.data;
			f.c = '-';

			// first extracted node as left child.
			f.left = x;

			// second extracted node as the right child.
			f.right = y;

			// marking the f node as the root node.
			root = f;

			// add this node to the priority-queue.
			q.add(f);
		}

		// print the codes by traversing the tree
		printCode(root, "");
	}
}
// node class is the basic structure
// of each node present in the Huffman - tree.
class HuffmanNode {

	int data;
	char c;

	HuffmanNode left;
	HuffmanNode right;
}
// comparator class helps to compare the node
// on the basis of one of its attribute.
// Here we will be compared
// on the basis of data values of the nodes.
class MyComparator implements Comparator<HuffmanNode> {
	public int compare(HuffmanNode x, HuffmanNode y)
	{

		return x.data - y.data;
	}
}

输出

f: 0
c: 100
d: 101
a: 1100
b: 1101
e: 111
……………….
Process executed in 1.11 seconds
Press any key to continue.

说明

通过遍历，霍夫曼树被创建和解码。然后在位于叶节点中的字符上应用遍历期间收集的值。使用霍夫曼码可以这样识别给定数据流中的每个唯一字符。O (nlogn)，其中 n 是字符总数，是时间复杂度。如果 n 个节点，则调用 ExtractMin() 2*(n - 1) 次。由于 extractMin() 调用 minHeapify()，其执行时间为 O (logn)。因此，总复杂度为 O (nlogn)。如果输入数组已排序，则存在线性时间算法。我们将在下一篇文章中更详细地介绍这一点。

霍夫曼编码的缺点

在本节中，我们将讨论霍夫曼编码的缺点以及为什么它并不总是最佳选择

如果并非所有字符的概率或频率都是 2 的负幂，则它不被认为是理想的。
虽然可以通过分组符号和扩展字母表来接近理想情况，但阻塞方法需要处理更大的字母表。因此，霍夫曼编码可能并不总是非常有效。
虽然有许多有效的方法来计算每个符号或字符的频率，但为每个符号或字符重建整个树可能非常耗时。当字母表很大且概率分布随每个符号快速变化时，通常是这种情况。

贪心霍夫曼码构造算法

霍夫曼开发了一种贪心技术，为输入数据流中的每个不同字符生成霍夫曼码，一种理想的前缀码。
该方法每次使用最少的节点自下而上地创建霍夫曼树。
由于每个字符的代码长度取决于它在给定数据流中出现的频率，因此此方法被称为贪心方法。如果检索到的代码大小较小，那么它在数据中是一种常见元素。

霍夫曼编码的用途

在这里，我们将讨论霍夫曼编码的一些实际用途
传统的压缩格式，如 PKZIP、GZIP 等，通常采用霍夫曼编码。
霍夫曼编码用于传真和文本数据传输，因为它减小了文件大小并加快了传输速度。
霍夫曼编码（特别是前缀码）被一些多媒体存储格式（如 JPEG、PNG 和 MP3）用于压缩文件。
霍夫曼编码主要用于图像压缩。
当需要传输包含频繁重复字符的字符串时，它可能更有帮助。

结论

总的来说，霍夫曼编码对于压缩包含频繁出现字符的数据很有帮助。
我们可以看到，出现频率最高的字符具有最短的代码，而出现频率最低的字符具有最长的代码。
霍夫曼编码压缩技术用于创建变长编码，它为每个字母或符号使用不同数量的比特。这种方法优于定长编码，因为它占用的内存更少，传输数据速度更快。
阅读本文，以更好地了解贪心算法。

下一主题Kadane's Algorithm

我们提供所有技术（如 Java 教程、Android、Java 框架）的教程和面试问题

联系信息

G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India

hr@tpointtech.com

+91-9599086977

关注我们

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

DAA 教程

渐进分析

复发

排序分析

分而治之

排序

下界理论

线性时间排序

哈希

二叉搜索树

红黑树

动态规划

贪婪算法

回溯

最小生成树

最短路径

所有对最短路径

最大流

排序网络

复杂度理论

近似算法

字符串匹配

面试题

其他

霍夫曼编码算法

前缀规则

如何从输入字符构建霍夫曼树？

霍夫曼编码示例

霍夫曼编码算法

理解代码

以上代码的 Java 实现

霍夫曼编码的缺点

贪心霍夫曼码构造算法

霍夫曼编码的用途

结论

相关帖子

霍夫曼编码

活动选择问题

分数背包问题

动态规划 vs 贪心算法

贪心算法

旅行商问题

活动或任务调度问题

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器