Python 正则表达式

28 Aug 2025 | 阅读 10 分钟

正则表达式（Regular Expression），简称 RegEx，是一系列独特的字符组成的搜索模式。我们可以使用 RegEx 来检查一个字符串是否包含指定的搜索模式。

通过匹配文本中的给定模式，它可以找到该文本的存在与否，并能将模式分解成小部分。

Python 正则表达式模块

Python 中的正则表达式是通过内置的“re”包来处理的。可以使用 import 语句导入此模块，如下所示：

语法

# importing the re module
import re

如何在 Python 中使用 RegEx？

下面的示例在给定字符串中搜索单词“platform”，并打印其起始和结束索引。

示例

# simple example to show the use of regular expression

# importing the re module
import re

# given string
str_1 = 'Tpoint Tech: An amazing platform to learn coding'

# searching for specified pattern in given string
matched_str = re.search(r'platform', str_1) # using the search() method

# printing the starting and ending index
print('Beginning Index:', matched_str.start())
print('Ending Index:', matched_str.end())

立即执行

输出

Beginning Index: 24
Ending Index: 32

说明

在此示例中，我们使用了 re 模块的 search() 函数来定位给定字符串中的指定模式。在这里，由 (r'platform') 中的字符 r 组成的搜索模式代表原始字符串。然后，我们使用 start() 和 end() 函数来返回字符串中匹配模式的起始和结束索引。

Python 中的 RegEx 函数

在 Python 中，re 模块包含各种函数，允许用户根据指定的模式查找、匹配和操作字符串。

以下表格列出其中一些函数：

RegEx 函数	描述
re.search()	用于定位字符的第一个出现。
re.findall()	用于查找并以列表形式返回所有匹配项。
re.compile()	用于将正则表达式编译成模式对象。
re.split()	用于根据特定字符或模式的出现次数来分割字符串。
re.sub()	用于将字符或模式的所有出现替换为指定的替换字符串。
re.escape()	用于转义特殊字符。

让我们通过示例来理解这些函数的用法。

re.search()

re.search() 函数用于返回模式的第一个出现。该函数搜索整个字符串，如果找到匹配项，则返回一个匹配对象。否则，在未找到匹配项时返回 None。

让我们看一个在给定字符串中搜索指定模式的示例。

示例

# importing the re module
import re

# given string
str_1 = 'I have been working as a Web Developer since 2023.'
regex_pattern = r"([a-zA-Z]+) (\d+)"

# searching for specified pattern in given string
matched_str = re.search(regex_pattern, str_1) # using the search() method

# checking the returned object
if matched_str:
  # printing the matched pattern details
  print('Match Found:', matched_str.group())
  print('Beginning Index:', matched_str.start())
  print('Ending Index:', matched_str.end())
else:
  print('Match Not Found')

立即执行

输出

Match Found: since 2023
Beginning Index: 39
Ending Index: 49

说明

在上面的示例中，我们使用了 re 模块的 search() 函数来查找后跟数字的单词的出现。由于给定字符串中存在“since 2023”这样的模式，因此返回了该模式。然后，我们使用 group() 函数打印匹配的模式，并使用 start() 和 end() 函数分别打印起始和结束索引。

re.findall()

re.findall() 函数用于返回给定字符串中非重叠匹配项的列表。与仅返回第一个匹配项的 search() 函数不同，findall() 函数将所有匹配项作为列表返回。

现在让我们看一个简单的示例。

示例

# importing the re module
import re

# given string
str_1 = """My house no. is 4567 and 
          my office no. is 8910."""

regex_pattern = r"([a-zA-Z]+) (\d+)"

# searching all occurrences in the given string
matched_str_list = re.findall(regex_pattern, str_1)

# checking the returned object
if matched_str_list:
  print(matched_str_list)
else:
  print("No match found")

立即执行

输出

[('is', '4567'), ('is', '8910')]

说明

在此示例中，我们使用了 re 模块的 findall() 函数来查找给定字符串中指定模式的所有出现，并将返回的对象存储在一个变量中。然后，我们打印了匹配模式的列表。

re.compile()

re.compile() 函数用于将正则表达式模式编译成一个可重用的 regex 对象，使我们能够多次使用其方法（如 search()、findall() 等），而无需重写模式。

以下是一个展示 compile() 函数用法的示例。

示例

# importing the re module
import re

# given string
str_1 = "Welcome to Tpoint Tech."

# using compile() function
regex_pattern = re.compile('[a-e]')

# searching all occurrences in the given string
matched_str_list = re.findall(regex_pattern, str_1)

if matched_str_list:
  print(matched_str_list)
else:
  print("No match found")

立即执行

输出

['e', 'c', 'e', 'e', 'c']

说明

在此示例中，我们使用 re.compile() 函数将正则表达式模式编译成一个可重用的 regex 对象。然后，我们调用 findall() 函数来搜索给定字符串中的所有出现。

re.split()

re.split() 函数用于在正则表达式模式匹配的地方分割字符串，类似于 str.split() 函数；但是，它支持更强大的模式。

让我们看一个下面的例子

示例

# importing the re module
import re

# given string
str_1 = "mango banana,apple;orange,cherry"

# using the split() function
regex_pattern = r'[;,\s]'

# splitting on semicolon, comma, or space
matched_str_list = re.split(regex_pattern, str_1)

if matched_str_list:
  print(matched_str_list)
else:
  print("No match found")

立即执行

输出

['mango', 'banana', 'apple', 'orange', 'cherry']

说明

在这里，我们使用了 re 模块的 split() 函数来分割给定字符串，其中指定的正则表达式模式匹配。

re.sub()

re.sub() 函数是 re 模块的一个函数，用于将字符串中正则表达式模式的所有出现替换为指定的替换字符串。此函数类似于使用正则表达式模式进行“查找和替换”。

现在，我们将看下面的示例。

示例

# importing the re module
import re

# given string
original_str = "Roses are red, Violets are blue."
print("Original String:", original_str)

# pattern and replacement
pattern = "red"
replacement = "white"

# using the sub() function
new_str = re.sub(pattern, replacement, original_str)
print("New String:", new_str)

立即执行

输出

Original String: Roses are red, Violets are blue.
New String: Roses are white, Violets are blue.

说明

在此示例中，我们使用了 re 模块的 sub() 函数来查找给定字符串中的指定模式，并用指定的替换进行了替换。

re.subn()

subn() 函数是 re 模块的另一个函数，其工作方式与 sub() 函数类似。但是，它返回一个元组，其中包含新字符串以及在给定字符串中进行的更改次数。

让我们看下面的示例来理解 re.subn() 函数的用法。

示例

# importing the re module
import re

# given string
original_str = "This building has 4 floors. There are 3 flats on each floor."

# pattern and replacement
pattern = r'\d+'
replacement = 'many'

# using the subn() function
new_str, num_subs = re.subn(pattern, replacement, original_str)

# printing the results
print("Original string:", original_str)
print("New string:", new_str)
print("Number of substitutions:", num_subs)

立即执行

输出

Original string: This building has 4 floors. There are 3 flats on each floor.
New string: This building has many floors. There are many flats on each floor.
Number of substitutions: 2

说明

在此示例中，我们使用 re.subn() 函数将给定字符串中指定模式的所有出现替换为指定的替换。然后，我们存储了新字符串和替换次数，并打印了它们。

re.escape()

escape() 函数是 re 模块的一个函数，用于转义字符串中的所有特殊字符，以便可以安全地将其用作正则表达式中的字面量。

以下是一个展示 re.escape() 函数用法的示例。

示例

# importing the re module
import re

# using the re.escape() function
print(re.escape("Welcome to Tpoint Tech"))
print(re.escape("We've \t learned various [a-9] concepts of& Python ^!"))

立即执行

输出

Welcome\ to\ Tpoint\ Tech
We've\ \	\ learned\ various\ \[a\-9\]\ concepts\ of\&\ Python\ \^!

说明

在此示例中，我们使用 re.escape() 函数转义了给定字符串中的特殊字符。

Python 正则表达式中的元字符

在正则表达式中，元字符是控制模式匹配方式的特殊字符。这些字符除非被反斜杠“\”转义，否则不被视为字面量。

下表包含 Python 正则表达式中的各种元字符。

元字符	描述
\	用于消除其后字符的特殊含义。
[]	表示一个字符类。
^	用于匹配开头。
$	用于匹配结尾。
.	用于匹配除换行符以外的任何字符。
\|	表示 OR（与它分隔的任何字符匹配）。
?	匹配零个或一个出现。
*	用于表示任意数量的出现（包括 0 次出现）。
+	用于表示一次或多次出现。
{}	用于指定匹配前面正则表达式的出现次数。
()	用于分组正则表达式。

让我们看一个元字符的例子。

示例

# importing the re module
import re

# The . meta-character matches any character (except newline)
text = "cat bat sat mat"
pattern = r"at"  # Matches 'at' in any word
matches = re.findall(pattern, text)
print(matches)

# The ^ meta-character matches the start of the string
text = "The quick brown fox"
pattern = r"^The" # Matches 'The' only if it is at the beginning
matches = re.findall(pattern, text)
print(matches)

# The $ meta-character matches the end of the string
text = "The quick brown fox"
pattern = r"fox$" # Matches 'fox' only if it is at the end
matches = re.findall(pattern, text)
print(matches)

# The * meta-character matches zero or more occurrences of the preceding character
text = "ab abb abbb"
pattern = r"ab*" # Matches 'a' followed by zero or more 'b's
matches = re.findall(pattern, text)
print(matches)

# The + meta-character matches one or more occurrences of the preceding character
text = "ab abb abbb"
pattern = r"ab+" # Matches 'a' followed by one or more 'b's
matches = re.findall(pattern, text)
print(matches)

立即执行

输出

['at', 'at', 'at', 'at']
['The']
['fox']
['ab', 'abb', 'abbb']
['ab', 'abb', 'abbb']

说明

在此示例中，我们可以看到像 .、^、$、* 和 + 这样的元字符的用法。我们在不同的正则表达式模式中使用了这些元字符，并使用 findall() 函数查找给定字符串中的所有匹配项。

Python 正则表达式中的特殊序列

特殊序列是用于表示常见字符类或位置的简写符号。这些序列以反斜杠“\”开头，后跟一个字母或符号。

以下是正则表达式中常用的特殊序列列表。

特殊序列	描述
\d	数字（0-9）。
\D	非数字。
\w	单词字符（字母、数字、下划线）。
\W	非单词字符。
\s	空格（空格、制表符、换行符）。
\S	非空格。
\b	单词边界。
\B	非单词边界。
\A	字符串开头。
\Z	字符串结尾。

现在我们来看下面的例子。

示例

# importing the re module
import re

# \d Matches a digit
txt = "Welcome to Tpoint 123 Tech"
x = re.findall("\d", txt)
print(x)

# \S Matches a non-whitespace character
txt = "tpoint tech"
x = re.findall("\S", txt)
print(x)

# \w Matches a word character (alphanumeric + underscore)
txt = "tpoint world_123"
x = re.findall("\w", txt)
print(x)

# \b Matches the boundary between a word and a non-word character
txt = "tpoint tech"
x = re.findall(r"\btech\b", txt)
print(x)

立即执行

输出

['1', '2', '3']
['t', 'p', 'o', 'i', 'n', 't', 't', 'e', 'c', 'h']
['t', 'p', 'o', 'i', 'n', 't', 'w', 'o', 'r', 'l', 'd', '_', '1', '2', '3']
['tech']

说明

在此示例中，我们可以看到正则表达式中使用了不同的特殊序列，如 \d、\S、\w 和 \b。

Python 正则表达式中的匹配对象

当我们使用 re.search() 或 re.match() 等函数时，如果找到匹配项，它们会返回一个匹配对象。该对象包含有关匹配的详细信息，包括其内容和位置。

让我们看一个例子。

示例

# importing the re module
import re

# given string
str_1 = "He is a boy and he plays cricket everyday"

# regex pattern
regex_pattern = r"he"

# Finding the first occurrence and return a match object
match_obj = re.search(regex_pattern, str_1, re.IGNORECASE)
if match_obj:
    print(f"Search match object: {match_obj}")
    print(f"Match start index: {match_obj.start()}")
    print(f"Match end index: {match_obj.end()}")
    print(f"Matched string: {match_obj.group(0)}")

立即执行

输出

Search match object: <re.Match object; span=(0, 2), match='He'>
Match start index: 0
Match end index: 2
Matched string: He

说明

在此示例中，我们可以看到正则表达式中匹配对象不同方法的用法。

以下是 Python 正则表达式中常用匹配对象方法的列表。

方法	描述
.start()	返回匹配的起始索引。
.end()	返回匹配的结束索引。
.span()	返回一个 (start, end) 元组。
.group()	返回匹配的字符串。
.groups()	以元组形式返回所有捕获组。
.group(n)	返回第 n 个捕获组。

结论

在本教程中，我们学习了 Python 编程语言中的正则表达式。我们理解了 Python 中的正则表达式是如何工作的，并探索了它的各种函数。我们还学习了其他重要概念，如元字符、特殊序列和匹配对象。

Python 正则表达式选择题

1. Python 中用于正则表达式的模块是哪个？

re
regex
pattern
string

答案： a) re

2. 正则表达式中的 \d 匹配什么？

一个字母
一个空格
一个数字
一个特殊字符

答案： c) 一个数字

3. 哪个函数用于查找字符串中模式的所有匹配项？

match()
findall()
search()
compile()

答案： b) re.findall()

4. 下列哪个匹配单个空格字符？

答案： a) \s

5. . (点) 元字符匹配什么？

仅数字
字符串结尾
空格字符
除换行符以外的任何单个字符

答案： d) 除换行符以外的任何单个字符

下一主题Python 使用 SMTP 发送电子邮件

Python 正则表达式

Python 正则表达式模块

如何在 Python 中使用 RegEx？

示例

Python 中的 RegEx 函数

re.search()

示例

re.findall()

示例

re.compile()

示例

re.split()

示例

re.sub()

示例

re.subn()

示例

re.escape()

示例

Python 正则表达式中的元字符

示例

Python 正则表达式中的特殊序列

示例

Python 正则表达式中的匹配对象

示例

结论

Python 正则表达式选择题

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python教程

Python变量和数据类型

Python控制语句

Python数据结构

Python函数

Python模块

Python OOP

Python异常处理

Python文件处理

Python搜索和排序

Python高级主题

Python MySQL

Python MongoDB

Python SQLite

Python MCQ

Python Tkinter (GUI)

Python Web Blocker

Python内置函数

Python字符串函数

Python列表

Python字典

Plotly

相关教程

Python 正则表达式

Python 正则表达式模块

如何在 Python 中使用 RegEx？

示例

Python 中的 RegEx 函数

re.search()

示例

re.findall()

示例

re.compile()

示例

re.split()

示例

re.sub()

示例

re.subn()

示例

re.escape()

示例

Python 正则表达式中的元字符

示例

Python 正则表达式中的特殊序列

示例

Python 正则表达式中的匹配对象

示例

结论

Python 正则表达式选择题

相关帖子

Python魔术方法

Python SimpleImputer模块

命令行参数

Python中的第二大数字

Python多处理

Python程序查找第n个斐波那契数

使用Python进行网络抓取

Python中的网格搜索

Python发送电子邮件