Java Boyer Moore

2025 年 5 月 12 日 | 阅读 9 分钟

Boyer-Moore 算法 是由 **Robert S. Boyer** 和 **J Strother Moore** 于 1977 年开发的字符串搜索或匹配算法。它是一种广泛使用且最高效的字符串匹配算法。它比暴力算法快得多。在本节中，我们将讨论 **Boyer-Moore 算法、特性** 及其 **在 Java 程序中的实现**。它的运行时间复杂度为 **O(nm+s)**。最坏情况是

T=ssssssss……………ssssssss

P=psssssssss

上述序列可能出现在图像和 DNA 序列中。

Boyer Moore 算法的特性

它从右到左逐个字符进行比较；
预处理阶段的时间和空间复杂度为 **O(m+)**。
搜索阶段的复杂度为 **O(mn)**。
在最坏情况下（搜索非周期性模式时），它会进行 3n 次文本字符比较。
其最佳性能复杂度为 **O(n / m)**。

该算法基于以下两种启发式方法

照后镜启发式
字符跳跃启发式

让我们了解 Boyer-Moore 算法的工作原理。

Boyer-Moore 算法的工作原理

该算法从给定模式的最右边字符开始跟踪字符，然后向左移动。在发生任何不匹配和完全匹配模式的情况下，它使用两个预先计算的函数，分别向右和向左移动字符。这两个预先计算的移位函数称为**好后缀移位**（或**匹配移位**）和**坏字符移位**（或出现移位）。

注意：为了匹配模式，请按从左到右的顺序对齐字符，并按从右到左的顺序比较字符。

坏字符移位

发生不匹配时，跳过对齐，直到满足以下条件之一

不匹配变成匹配
P 越过不匹配字符

例如，考虑下面给出的文本 (T) 和模式 (P)。

让我们开始匹配模式。

步骤 1：按从左到右的顺序对齐字符，并按从右到左的顺序比较字符。

我们看到 P 的最后三个字符与 T 中的字符匹配。第四个字符 (T) 不匹配。根据上面讨论的规则，跳过对齐，直到不匹配变成匹配。由于 P 中的第七个字符 (C) 与 T 中的 C 匹配。

步骤 2：向右跳过三个字符以匹配模式。

移动后，再次从右到左比较字符。第一个字符匹配。我们观察到字符 A 不出现在 P 的左侧。在这种情况下，P 移动到 T 中不匹配的字符 (A) 之后。

步骤 3：将 P 移过不匹配字符，我们得到

模式已匹配。

注意：坏字符移位可能为负值。由于 Boyer-Moore 算法在移位字符时会应用好后缀移位和坏字符移位之间的最大值（跳过的字符数）。

好后缀移位

设 t 是内循环匹配的子串，然后跳过字符直到

P 和 t 之间没有不匹配。
P 越过 t。

例如，考虑以下模式。

步骤 1：从右到左比较字符。我们看到 P 的最后三个字符与 T 中用 **t** 标记的字符匹配。

步骤 2：跳过字符，直到 P 和 t 之间没有匹配。我们观察到 P 的前四个字符（从左到右）（C T T A C）与 t 的最后五个字符匹配。

步骤 3：跳过三次对齐以获得匹配。因此，我们得到匹配。

上述两个移位函数可以定义如下

好后缀移位函数存储在一个名为 **bmGs** 的表中，大小为 **m+1**。表 **bmGs** 的计算使用一个名为 **suff** 的表，定义如下

坏字符移位函数存储在一个大小为 σ 的表 **bmBc** 中。对于 ∑ 中的 c

Boyer Moore 模式匹配示例

考虑以下模式。

让我们开始匹配。

步骤 1：从右到左比较字符。我们看到第一个字符不匹配，即 G 与 T 不匹配。

步骤 2：现在，跳过字符，直到找到匹配。六个字符后找到匹配。此处，好后缀移位规则不适用。

bc: 6, gs: 0

根据坏字符移位，P 越过不匹配字符（即 G）。

步骤 3：再次，从右到左比较字符。我们看到 P 的前三个字符（t）与 T 匹配，第四个不匹配。

在这里，我们可以应用两个函数，即坏字符后缀和好字符后缀。如果应用坏字符后缀，它只跳过一个字符。如果应用好字符后缀，它会跳过两次对齐。因此，我们将应用好字符后缀，因为算法规定，要跳过更多对齐。因此，我们跳过两次对齐。

bc: 0, gs: 2

通过三次对齐移位后，我们得到

bc: 2, gs: 7

在这里，我们观察到 C 不出现在 P 的左侧。因此，坏字符对齐跳过 **两次** 对齐，好字符对齐跳过 **七次** 对齐。

步骤 4：移位字符后，我们看到字符串已匹配。

在上述模式中，我们跳过了 15 次对齐，T 的 11 个字符被忽略了。

Boyer Moore 预处理阶段

模式 **T: A A T C A A T A G C** 和 **P: T C G C** 的预先计算的跳过可以定义如下。在上面的模式中，我们使用了坏字符移位函数。

上表定义了跳过的对齐（字符）数量。

Boyer Moore 算法伪代码

BoyerMooreMatch(T, P, ?)
L<- lastOccurenceFunction(P, ?)
i <- m-1
j <- m-1
repeat
                    if T[i]=P[j]
                             if j=0
                                  return i {match at i}
                             else
                                  i <- i-1
                                  j <- j-1
                    else
                         {character-jump}
                         L<-L[T[i]]
                         i <- i +m -min(j, 1+ l)
                         j <- m-1
until i>n-1
return-1 {no match}

模式搜索 Java 程序

让我们看看模式搜索 Java 程序。在下面的程序中，我们实现了暴力字符串搜索算法。

PatternSearchingExample.java

import java.util.HashMap;
import java.util.Map;
public class PatternSearchingExample
{
/** 
* @param text -- trace the text to see if it contains pattern
* @param pattern -- look for this text inside the text parameter
* @return -- return index of the first match or -1 if not found
*/
public static int findBruteForce(char[] text, char[] pattern) 
{
System.out.println("Brute force looking for " + String.valueOf(pattern) + " in " + String.valueOf(text));
int n = text.length;
int m = pattern.length;
//checks if the string is empty
if (m == 0) return 0;
//brute force it -- loop over all characters in text O(n)
for (int i=0;i<=n-m;i++) 
{ //index into the text
//loop over all characters in pattern while characters match O(m)
//index into the pattern
int k = 0; 
while (k<m && text[i+k] == pattern[k]) 
{
k++;
}
//if at end of the pattern, then found match starting at index i in text
if (k==m) 
{
System.out.println("\tFound match in the given text at index " + i);
return i;
}
}
//if match not found
System.out.println("\tNo match found in the given text.");
return -1;
}
/**

* @param text -- search this text to see if it contains pattern
* @param pattern -- look for this text inside the text parameter
* @return -- return index of the first match or -1 if not found
*/
public static int findBoyerMoore(char[] text, char[] pattern) 
{
System.out.println("Boyer-Moore looking for " + String.valueOf(pattern) + " in " + String.valueOf(text));
int n = text.length;
int m = pattern.length;
// Test for empty string
if (m == 0) return 0;
// Initialization, create Map of last position of each character = O(n)
Map<Character, Integer> last = new HashMap<>();
for (int i = 0; i < n; i++) 
{
// set all chars, by default, to -1    
last.put(text[i], -1);   
}        
for (int i = 0; i < m; i++) 
{
// update last seen positions    
last.put(pattern[i], i); 
}
//Start with the end of the pattern aligned at index m-1 in the text. 
//index into the text
int i = m - 1;  
// index into the pattern
int k = m - 1;  
while (i < n) 
{ 
if (text[i] == pattern[k]) 
{
// match! return i if complete match; otherwise, keep checking    
if (k == 0) 
{
System.out.println("\tFound match in the given text at index " + i);
return i; 
}
i--; k--;
} 
else 
{ // jump step + restart at end of pattern
//iterate over text 
i += m - Math.min(k, 1 + last.get(text[i]));  
//move to end of pattern
k = m - 1; 
}
}
System.out.println("\tNo match found in the given text.");
// not found
return -1; 
}
public static void main(String args[]) 
{
char[] text = "abcfefabddef".toCharArray();
char[] pattern = "abddef".toCharArray();
//function calling
findBruteForce(text,pattern);
findBoyerMoore(text,pattern);        
}
} 

输出

Brute force looking for abddef in abcfefabddef
	Found match in the given text at index 6
Boyer-Moore looking for abddef in abcfefabddef
	Found match in the given text at index 6

让我们用 Java 程序实现该算法。

Boyer Moore Java 程序

让我们实现 Boyer-Moore 算法并通过 Java 程序搜索模式。

BoyerMooreImplementation.java

public class BoyerMooreImplementation
{
static int NO_OF_CHARS = 256; 
static int max (int a, int b) 
{ 
return (a > b)? a: b; 
} 
static void badCharHeuristic( char []str, int size, int badchar[]) 
{ 
int i; 
for (i = 0; i < NO_OF_CHARS; i++) 
badchar[i] = -1; 
for (i = 0; i < size; i++) 
badchar[(int) str[i]] = i; 
} 
static void search( char txt[],  char pat[]) 
{ 
int m = pat.length; 
int n = txt.length; 
int badchar[] = new int[NO_OF_CHARS]; 
//function calling
badCharHeuristic(pat, m, badchar); 
int s = 0;  
while(s <= (n - m)) 
{ 
int j = m-1; 
while(j >= 0 && pat[j] == txt[s+j]) 
j--; 
if (j < 0) 
{ 
System.out.println("Patterns occur at character = " + s); 
s += (s+m < n)? m-badchar[txt[s+m]] : 1; 
} 
else
s += max(1, j - badchar[txt[s+j]]); 
} 
} 
public static void main(String args[]) 
{ 
//text in which pattern occurs
char txt[] = "123651266512".toCharArray(); 
//pattern to search
char pat[] = "12".toCharArray(); 
search(txt, pat); 
} 
}

输出

Patterns occur at character = 0
Patterns occur at character = 5
Patterns occur at character = 10

让我们看另一个 Java 程序，在该程序中我们实现了不同的模式搜索逻辑。下面的程序检查是否在文本中找到了指定的模式。

BoyerMooreExample.java

public class BoyerMooreExample
{
public static void main(String args[]) 
{
        System.out.println("Matching Pattern");
        test("aabbccdef", "cde", 0);
        test("zzzzaaapppxyzabc", "pqrs", 1);
        test("mango", "ngo", 2);
        test("abc", "d", -1);
        test("catdog", "tdo", 2);
        test("pqrsabcdxyzamnop", "cdxyza", 1);
        test("cool", "", 0);
        test("", "car", -1);
}
    public static void test(String text, String word, int exp) 
    {
        char[] textC = text.toCharArray();
        char[] wordC = word.toCharArray();
        int result = bm(textC, wordC);
        if(result == exp)
            System.out.println("Pattern Matched");
        else 
        {
            System.out.println("Pattern Not Matched");
            System.out.println("\ttext: " + text);
            System.out.println("\tword: " + word);
            System.out.println("\texp: " + exp + ", res: " + result);
        }//end of else
    }//end of function
    public static int[] makeD1(char[] pat) 
    {
        int[] table = new int[255];
        for(int i=0; i<255; i++)
            table[i] = pat.length;
        for(int i=0; i<pat.length-1; i++)
            table[pat[i]] = pat.length-1-i;
        return table;
    }//end of function
    public static boolean isPrefix(char[] word, int pos) 
    {
        int suffixlen = word.length - pos;
        for(int i=0; i<suffixlen; i++)
            if(word[i] != word[pos+i])
                return false;
        return true;
    }//end of function
    public static int suffix_length(char[] word, int pos) 
    {
        int i;
        for(i=0; ((word[pos-i] == word[word.length-1-i]) & (i < pos)); i++)
            {
                
            }//end of for loop
        return i;
    }//end of function 
    public static int[] makeD2(char[] pat) 
    {
        int[] delta2 = new int[pat.length];
        int p;
        int last_prefix_index = pat.length - 1;
        for(p = pat.length-1; p>=0; p--) 
        {
            if(isPrefix(pat, p+1))
                last_prefix_index = p+1;
            delta2[p] = last_prefix_index + (pat.length-1-p);
        }//end of for loop
        for(p=0; p<pat.length-1; p++) 
        {
            int slen = suffix_length(pat, p);
            if(pat[p-slen] != pat[pat.length-1-slen])
                delta2[pat.length-1-slen] = pat.length-1-p+slen;
        }//end of for loop
        return delta2;
    }//end of function
    public static int bm(char[] string, char[] pat) 
    {
        int[] d1 = makeD1(pat);
        int[] d2 = makeD2(pat);
        int i = pat.length-1;
        while(i < string.length) 
        {
            int j = pat.length-1;
            while(j>=0 && (string[i] == pat[j])) 
            {
                i--; //decrement i by 1
                j--; //decrement j by 1
            }//end of while
            if(j < 0)
                return (i+1);
            i += Math.max(d1[string[i]], d2[j]);
        } //end of while
        return -1;
    }//end of function
}

输出

Pattern Not Matched
	text: aabbccdef
	word: cde
	exp: 0, res: 5
Pattern Not Matched
	text: zzzzaaapppxyzabc
	word: pqrs
	exp: 1, res: -1
Pattern Matched
Pattern Matched
Pattern Matched
Pattern Not Matched
	text: pqrsabcdxyzamnop
	word: cdxyza
	exp: 1, res: 6
Pattern Matched
Pattern Matched

下一主题Keytool-error-java-io-ioexception-incorrect-ava-format

Java Boyer Moore

Boyer Moore 算法的特性

Boyer-Moore 算法的工作原理

注意：为了匹配模式，请按从左到右的顺序对齐字符，并按从右到左的顺序比较字符。

坏字符移位

注意：坏字符移位可能为负值。由于 Boyer-Moore 算法在移位字符时会应用好后缀移位和坏字符移位之间的最大值（跳过的字符数）。

好后缀移位

Boyer Moore 模式匹配示例

Boyer Moore 预处理阶段

Boyer Moore 算法伪代码

模式搜索 Java 程序

Boyer Moore Java 程序

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Java Conversion

Java Misc

Java Boyer Moore

Boyer Moore 算法的特性

Boyer-Moore 算法的工作原理

注意：为了匹配模式，请按从左到右的顺序对齐字符，并按从右到左的顺序比较字符。

坏字符移位

注意：坏字符移位可能为负值。由于 Boyer-Moore 算法在移位字符时会应用好后缀移位和坏字符移位之间的最大值（跳过的字符数）。

好后缀移位

Boyer Moore 模式匹配示例

Boyer Moore 预处理阶段

Boyer Moore 算法伪代码

模式搜索 Java 程序

Boyer Moore Java 程序

相关帖子

Java 中 CompositeName get() 方法及示例

Maximum difference of Zeros and Ones in Binary String Using Java

破坏 Java 中的单例类模式

Java 静态类型与动态类型

Java 中最长奇偶子序列

OOPs MCQ

Java JFileChooser

Java 中的字符串操作

How to Create Object in Java

Automorphic Number Program in Java

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器