Java Huffman 编码

2025年03月17日 | 阅读 9 分钟

哈夫曼编码算法由 David A. Huffman 于 1950 年提出。它是一种 无损数据压缩 机制。它也称为 数据压缩编码。 它被广泛用于图像（JPEG 或 JPG）压缩。在本节中，我们将讨论 哈夫曼编码 和 解码， 并用 Java 程序实现其算法。

我们知道每个字符都是由 0 和 1 组成的序列，并使用 8 位存储。该机制称为 定长编码，因为每个字符都使用相同数量的固定位数存储。

这时，会有一个问题出现：是否有可能减少存储字符所需的空间量？

是的， 使用 变长编码 是可能的。在这种机制中，我们利用一些比其他字符出现频率更高的字符。在这种编码技术中，我们可以通过减少位数来表示相同文本或字符串。

哈夫曼编码

哈夫曼编码实现以下步骤。

它为所有给定字符分配变长编码。
字符的编码长度取决于它在给定文本或字符串中出现的频率。
如果一个字符出现的频率很高，它将获得最短的编码。
如果一个字符出现的频率很低，它将获得最长的编码。

哈夫曼编码遵循 前缀规则，该规则可防止在解码时产生歧义。该规则还确保分配给字符的编码不会被视为分配给任何其他字符的编码的前缀。

哈夫曼编码涉及以下两个主要步骤：

首先，根据给定的输入字符串或字符或文本构建 哈夫曼树。
通过遍历树，为每个字符分配哈夫曼编码。

让我们简要介绍上述两个步骤。

哈夫曼树

步骤 1： 对于每个字符节点，创建一个叶节点。字符的叶节点包含该字符的频率。

步骤 2： 根据频率对所有节点进行排序。

步骤 3： 可能存在两个节点频率相同的情况。在这种情况下，请执行以下操作：

创建一个新的内部节点。
该节点的频率将是具有相同频率的两个节点的频率之和。
将第一个节点标记为新创建的内部节点的左子节点，另一个节点标记为右子节点。

步骤 4： 重复步骤 2 和 3，直到所有节点形成一棵单一的树。这样，我们就得到了一棵哈夫曼树。

哈夫曼编码示例

假设我们要编码字符串 abracadabra。 确定以下内容：

所有字符的哈夫曼编码
给定字符串的平均编码长度
编码字符串的长度

(i) 所有字符的哈夫曼编码

为了确定每个字符的编码，我们首先构建 哈夫曼树。

步骤 1： 将字符与其频率配对。

(a, 5), (b, 2), (c, 1), (d, 1), (r, 2)

步骤 2： 根据频率对配对进行排序，我们得到

(c, 1), (d, 1), (b, 2) (r, 2), (a, 5)

步骤 3： 选择前两个字符并将它们组合在一个父节点下。

我们观察到父节点没有频率，所以我们必须给它分配频率。父节点的频率将是其左右子节点的频率之和，即 1+1=2。

步骤 4： 重复步骤 2 和 3，直到我们得到一棵单一的树。

我们观察到配对已经按排序（根据步骤 2）顺序排列。再次选择前两个配对并将它们组合起来。

我们观察到父节点没有频率，所以我们必须给它分配频率。父节点的频率将是其左右子节点的频率之和，即 2+2=4。

再次，我们检查配对是否按排序顺序排列。在此步骤中，我们需要对配对进行排序。

根据步骤 3，选择前两个配对并将它们组合起来，我们得到

我们观察到父节点没有频率，所以我们必须给它分配频率。父节点的频率将是其左右子节点的频率之和，即 2+4=6。

再次，我们检查配对是否按排序顺序排列。在此步骤中，我们需要对配对进行排序。排序后，树的外观如下：

根据步骤 3，选择前两个配对并将它们组合起来，我们得到

我们观察到父节点没有频率，所以我们必须给它分配频率。父节点的频率将是其左右子节点的频率之和，即 5+6=11。

因此，我们得到一棵单一的树。

最后，我们将通过上述树找到每个字符的编码。为每条边分配权重。请注意，每个左边缘的权重为 0，右边缘的权重为 1。

我们观察到输入字符仅出现在叶节点中，而内部节点具有空值。为了找到每个字符的哈夫曼编码，从根节点到需要查找编码的特定字符的叶节点遍历哈夫曼树。该表描述了每个字符的编码和编码长度。

Character	频率	编码	编码长度
A	5	0	1
B	2	111	3
C	1	1100	4
D	1	1101	4
R	2	10	2

我们观察到出现频率最高的字符获得了最短的编码长度，而出现频率最低的字符获得了最长的编码长度。

现在我们可以编码我们上面使用的字符串 (abracadabra)。

(ii) 字符串的平均编码长度

可以使用以下公式确定哈夫曼树的平均编码长度

= { (5 x 1) + (2 x 3) + (1 x 4) + (1 x 4) + (2 x 2) } / (5+2+1+1+2)

= 2.09090909

(iii) 编码字符串的长度

可以使用以下公式确定编码消息的长度

length= Total number of characters in the text x Average code length per character

= 11 x 2.09090909

= 23 位

哈夫曼编码算法

Huffman (C)
n=|C|
Q=C
for i=1 to n-1
    do
        z=allocate_Node()
        x=left[z]=Extract_Min(Q)
        y=right[z]=Extract_Min(Q)
        f[z]=f[x]+f[y]
Insert(Q,z)
return Extract_Min(Q)

哈夫曼算法是一种贪婪算法。因为在每个阶段，算法都会寻找最佳可用选项。

哈夫曼编码的时间复杂度为O(nlogn)。 其中 n 是给定文本中的字符数。

哈夫曼解码

哈夫曼解码是一种将编码数据转换为初始数据的技术。正如我们在编码中看到的，哈夫曼树是为输入字符串构建的，字符根据它们在树中的位置进行解码。解码过程如下：

从根节点开始遍历树并搜索字符。
如果我们在二叉树中向左移动，则在编码中添加0。
如果我们在二叉树中向右移动，则在编码中添加1。

子节点包含输入字符。它被分配由连续的 0 和 1 组成的编码。解码字符串的时间复杂度为O(n)， 其中 n 是字符串的长度。

哈夫曼编码和解码 Java 程序

在以下程序中，我们使用优先级队列、堆栈和树等数据结构来设计压缩和解压缩逻辑。我们将基于广泛使用的哈夫曼编码算法技术来构建我们的实用程序。

HuffmanCode.java

import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
//defining a class that creates nodes of the tree
class Node
{
    //storing character in ch variable of type character
    Character ch;
    //storing frequency in freq variable of type int
    Integer freq;
    //initially both child (left and right) are null
    Node left = null; 
    Node right = null; 
    //creating a constructor of the Node class
    Node(Character ch, Integer freq)
    {
        this.ch = ch;
        this.freq = freq;
    }
    //creating a constructor of the Node class
    public Node(Character ch, Integer freq, Node left, Node right)
    {
        this.ch = ch;
        this.freq = freq;
        this.left = left;
        this.right = right;
    }
}
//main class
public class HuffmanCode
{
	//function to build Huffman tree
	public static void createHuffmanTree(String text)
    {
        //base case: if user does not provides string
        if (text == null || text.length() == 0) 
        {
            return;
        }
        //count the frequency of appearance of each character and store it in a map
        //creating an instance of the Map 
        Map<Character, Integer> freq = new HashMap<>();
        //loop iterates over the string and converts the text into character array
        for (char c: text.toCharArray()) 
        {
        	//storing character and their frequency into Map by invoking the put() method 
            freq.put(c, freq.getOrDefault(c, 0) + 1);
        }
        //create a priority queue that stores current nodes of the Huffman tree.
        //here a point to note that the highest priority means the lowest frequency 
        PriorityQueue<Node> pq = new PriorityQueue<>(Comparator.comparingInt(l -> l.freq));
        //loop iterate over the Map and returns a Set view of the mappings contained in this Map
        for (var entry: freq.entrySet()) 
        {
        	//creates a leaf node and add it to the queue
            pq.add(new Node(entry.getKey(), entry.getValue()));
        }
        //while loop runs until there is more than one node in the queue
        while (pq.size() != 1)
        {
            //removing the nodes having the highest priority (the lowest frequency) from the queue
            Node left = pq.poll();
            Node right = pq.poll();
            //create a new internal node with these two nodes as children and with a frequency equal to the sum of both nodes' frequencies. Add the new node to the priority queue.
            //sum up the frequency of the nodes (left and right) that we have deleted 
            int sum = left.freq + right.freq;
            //adding a new internal node (deleted nodes i.e. right and left) to the queue with a frequency that is equal to the sum of both nodes
            pq.add(new Node(null, sum, left, right));
        }
        //root stores pointer to the root of Huffman Tree
        Node root = pq.peek();
        //trace over the Huffman tree and store the Huffman codes in a map
        Map<Character, String> huffmanCode = new HashMap<>();
        encodeData(root, "", huffmanCode);
        //print the Huffman codes for the characters
        System.out.println("Huffman Codes of the characters are: " + huffmanCode);
        //prints the initial data
        System.out.println("The initial string is: " + text);
        //creating an instance of the StringBuilder class 
        StringBuilder sb = new StringBuilder();
        //loop iterate over the character array
        for (char c: text.toCharArray()) 
        {
        	//prints encoded string by getting characters 
            sb.append(huffmanCode.get(c));
        } 
        System.out.println("The encoded string is: " + sb);
        System.out.print("The decoded string is: ");
        if (isLeaf(root))
        {
            //special case: For input like a, aa, aaa, etc.
            while (root.freq-- > 0) 
            {
                System.out.print(root.ch);
            }
        }
        else 
        {
            //traverse over the Huffman tree again and this time, decode the encoded string
            int index = -1;
            while (index < sb.length() - 1) 
            {
                index = decodeData(root, index, sb);
            }
        }
    }
//traverse the Huffman Tree and store Huffman Codes in a Map
    //function that encodes the data
    public static void encodeData(Node root, String str, Map<Character, String> huffmanCode)
    {
        if (root == null) 
        {
            return;
        }
        //checks if the node is a leaf node or not
        if (isLeaf(root)) 
        {
            huffmanCode.put(root.ch, str.length() > 0 ? str : "1");
        }
        encodeData(root.left, str + '0', huffmanCode);
        encodeData(root.right, str + '1', huffmanCode);
    }
    //traverse the Huffman Tree and decode the encoded string function that decodes the encoded data
    public static int decodeData(Node root, int index, StringBuilder sb)
    {
        //checks if the root node is null or not
        if (root == null) 
        {
            return index;
        } 
        //checks if the node is a leaf node or not
        if (isLeaf(root))
        {
            System.out.print(root.ch);
            return index;
        }
        index++; 
        root = (sb.charAt(index) == '0') ? root.left : root.right;
        index = decodeData(root, index, sb);
        return index;
    }
    //function to check if the Huffman Tree contains a single node
    public static boolean isLeaf(Node root) 
    {
    	//returns true if both conditions return ture
        return root.left == null && root.right == null;
    }
    //driver code
    public static void main(String args[])
    {
        String text = "javatpoint";
        //function calling
        createHuffmanTree(text);
    }
}

输出

Huffman Codes of the characters are: {p=000, a=110, t=111, v=001, i=010, j=011, n=100, o=101}
The initial string is: javatpoint
The encoded string is: 011110001110111000101010100111
The decoded string is: javatpoint

下一主题Java Snippet Class

Java Huffman 编码

哈夫曼编码

哈夫曼树

哈夫曼编码示例

(i) 所有字符的哈夫曼编码

(ii) 字符串的平均编码长度

(iii) 编码字符串的长度

哈夫曼编码算法

哈夫曼解码

哈夫曼编码和解码 Java 程序

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Java Conversion

Java Misc

Java Huffman 编码

哈夫曼编码

哈夫曼树

哈夫曼编码示例

(i) 所有字符的哈夫曼编码

(ii) 字符串的平均编码长度

(iii) 编码字符串的长度

哈夫曼编码算法

哈夫曼解码

哈夫曼编码和解码 Java 程序

相关帖子

Java 程序使用两个线程打印奇偶数

Java 中的常量列表

strictfp keyword in Java

如何在 Java 中安装 SSL 证书

Java 中初始化列表

Java 中的托普利茨矩阵

Java IdentityHashMap Class

Which Java Container is Useful for Competitive Programming?

Complex Java Programs

Java 中反转字符串并保留空格

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器