在 Pandas 中将包含 'yes' 和 'no' 值的列替换为 True 和 False | Python

2024 年 8 月 29 日 | 阅读 6 分钟

数据框中的值会逐渐被不同的值替换。这与使用 .loc 或 .iloc 进行更新不同，它们需要您指定要用某个值更新的位置。

to_replace: str, regex, list, dict, Series, int, float, or None

将被替换的值的查找方式。

numeric, str or regex

numeric: 等同于 to_replace 的数值将被替换为 value
str: 精确匹配 to_replace 的字符串将被替换为 value
regex: 匹配 to_replace 的正则表达式将被替换为 value

List of str, regex, or numeric

首先，如果 to_replace 和 value 都是列表，它们的长度应相同。
其次，如果 regex=True，则两个列表中的字符串都将被解释为正则表达式，否则它们将直接匹配。这对 value 来说区别不大，因为可能的替换正则表达式很少。
str, regex 和 numeric 规则如上所述。

dict

字典可用于为不同的现有值指定多个替换值。例如，{'a': 'b', 'y': 'z'} 会将值 'a' 替换为 'b'，并将 'y' 替换为 'z'。要以这种方式使用字典，value 参数应为 None。
对于数据框，字典可以指定在不同列中替换不同的值。例如，{'a': 1, 'b': 'z'} 会在列 'a' 中查找值 1，并在列 'b' 中查找值 'z'，然后用 value 中指定的值替换这些值。在这种情况下，value 参数不应为 None。您可以将其视为传递两个列表的特殊情况，只是您指定了要查找的列。
对于数据框，嵌套字典，例如 {'a': {'b': np.nan}}，如下读取：在列 'a' 中查找值 'b' 并将其替换为 NaN。value 参数应为 None 才能以这种方式使用嵌套字典。您也可以嵌套正则表达式。请注意，列名（嵌套字典的高层字典键）不能是正则表达式。

无

这意味着 regex 参数必须是字符串、编译后的正则表达式、或包含这些元素的列表、字典、ndarray 或 Series。如果 value 也为 None，则它必须是嵌套字典或 Series。

值

scalar, dict, list, str, regex, default None。用于将任何与 to_replace 匹配的值替换为的 value。对于数据框，可以使用字典指定每列要使用的 value（不在字典中的列不会被填充）。也允许使用正则表达式、字符串以及它们的列表或字典。

inplace: Boolean, default False

如果为 True，则原地修改。注意：这将修改此对象的任何其他视图（例如数据框的列）。如果为 True，则返回调用者。

limit: int, default None

向前或向后填充的最大间隔大小。

regex: bool or same types as to_replace, default False

是否将 to_replace 和/或 value 解释为正则表达式。如果为 True，则 to_replace 必须是字符串。或者，它可以是正则表达式，也可以是包含正则表达式的列表、字典或数组，在这种情况下 to_replace 必须为 None。

method: {'pad', 'ffill', 'bfill', None}

当 to_replace 是标量、列表或元组且 value 为 None 时，用于替换的方法。在版本 0.23.0 中已更改：已添加到 DataFrame。

Returns:

Data Frame: 替换后的对象

Raises:

Assertion Error: 如果 regex 不是 bool 且 to_replace 不是 None。

Value Error

如果 to_replace 是字典而 value 不是列表、字典、ndarray 或 Series。如果 to_replace 是 None 而 regex 不能编译成正则表达式，或者是一个列表、字典、ndarray 或 Series。

示例数据框

Std data = {'name of the student': ['Ajay', 'Sai', 'Chikky', 'Pavani', 'Pojitha', 'Michael', 'Sri', 'Devi', 'David', 'Gopal'],

'Scores of the Student': [11.5, 7, 20.5, np.nan, 6, 21, 22.5, np.nan, 10, 30],

'Number of attempts': [10, 9, 5, 6, 7, 2, 8, 3, 2, 1],

'Pass': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

每个列的值将是

name: 'Anil', score: 18.5, Number of attempts: 1, Pass: 'yes', label: 'k'

示例

# Here, we are importing the pandas library as pd into our program
import pandas as pd
# Here, we are importing the NumPy library as np into our program    
import numpy as np
# Here, we are creating the student data with scores and Number of attempts of the student
Std_data = {'name of the student': ['Ajay', 'Sai', 'Chikky', 'Pavani', 'Pojitha', 'Michael', 'Sri', 'Devi', 'David', 'Gopal'],
'Scores of the Student': [11.5, 7, 20.5, np.nan, 6, 21, 22.5, np.nan, 10, 30],
'Number of attempts': [10, 9, 5, 6, 7, 2, 8, 3, 2, 1],
'Pass': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(Std_data, index=labels)
print("The Original rows of the Student data is:")
print(df)      # Here, we are printing the data of the student
print("\nHere, we are replacing the 'Pass' column contains the values 'yes' and 'no'  with True and  False:")
df['Pass'] = df['Pass'].map({'yes': True, 'no': False})
print(df)     
# Here, we are printing the data of the student after replacing the Pass column

输出

The Original rows of the student data is:
     Number of attempts	name of the student		Pass		Scores
a         	10 			Ajay	    		yes   		11.5                                  
b         9       			Sai	      		no		    7.0                                  
c         5 			Chikky     		yes   		20.5                                  
d         6      			Pavani      		no    		NaN                                  
e         7      			Pojitha      		no    		6.0                                  
f          2   			Michael     		yes   		21.0                                  
g         8    			Sri	     		yes   		22.5                                  
h         3      			Devi      		no    		NaN                                  
i          2      			David      		no    		10.0                                  
j          1      			Gopal     		yes   		30.0                                  
                                                                       
Here, we are replacing the 'Pass' column contains the values 'yes' and 'no' with True and False:                                                        
   Number of attempts	name of the student		Pass		Scores
a         	10 			Ajay	    		True   		11.5                                  
b          9       			Sai	      		False		    7.0                                  
c          5 			Chikky     		True   		20.5                                  
d          6     			Pavani      		False    	NaN                                  
e         7      			Pojitha      		False    	6.0                                  
f         2    			Michael     		True   		21.0                                  
g         8    			Sri	     		True   		22.5                                  
h         3      			Devi      		False    	NaN                                  
i          2      			David      		False    	10.0                                  
j          1      			Gopal     		True   		30.0

使用 DataFrame.replace() 方法

此方法用于从数据框中替换字符串、正则表达式、列表、字典、系列、数字等。

语法

DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)

示例

# Here, we are importing the pandas library as pd into our program
import pandas as pd
# Here, we are importing the NumPy library as np into our program    
import numpy as np
# Here, we are creating the student data with scores and Number of attempts of the student
Std_data = {'name of the student': ['Ajay', 'Sai', 'Chikky', 'Pavani', 'Pojitha', 'Michael', 'Sri', 'Devi', 'David', 'Gopal'],
'Scores of the Student': [11.5, 7, 20.5, np.nan, 6, 21, 22.5, np.nan, 10, 30],
'Number of attempts': [10, 9, 5, 6, 7, 2, 8, 3, 2, 1],
'Pass': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(Std_data, index=labels)
print("The Original rows of the Student data is:")
print(df)      # Here, we are printing the data of the student
print("\nHere, we are replacing the 'Pass' column contains the values 'yes' and 'no'  with True and  False:")
df = df.replace({'Pass':{'yes': True, 'no': False}})
print(df)     
# Here, we are printing the data of the student after replacing the Pass column

输出

The Original rows of the student data is:
     Number of attempts	name of the student		Pass		Scores
a         	10 			Ajay	    		yes   		11.5                                  
b         9       			Sai	      		no		    7.0                                  
c         5 			Chikky     		yes   		20.5                                  
d         6      			Pavani      		no    		NaN                                  
e         7      			Pojitha      		no    		6.0                                  
f          2   			Michael     		yes   		21.0                                  
g         8    			Sri	     		yes   		22.5                                  
h         3      			Devi      		no    		NaN                                  
i          2      			David      		no    		10.0                                  
j          1      			Gopal     		yes   		30.0                                  
                                                                       
Here, we are replacing the 'Pass' column contains the values 'yes' and 'no' with True and False:                                                        
   Number of attempts	name of the student		Pass		Scores
a         	10 			Ajay	    		True   		11.5                                  
b          9       			Sai	      		False		    7.0                                  
c          5 			Chikky     		True   		20.5                                  
d          6     			Pavani      		False    	NaN                                  
e         7      			Pojitha      		False    	6.0                                  
f         2    			Michael     		True   		21.0                                  
g         8    			Sri	     		True   		22.5                                  
h         3      			Devi      		False    	NaN                                  
i          2      			David      		False    	10.0                                  
j          1      			Gopal     		True   		30.0

下一主题Python 的 Scrapy 模块

在 Pandas 中将包含 'yes' 和 'no' 值的列替换为 True 和 False | Python

示例数据框

使用 DataFrame.replace() 方法

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

在 Pandas 中将包含 'yes' 和 'no' 值的列替换为 True 和 False | Python

示例数据框

使用 DataFrame.replace() 方法

相关帖子

如何在 Python 中初始化列表

Python 中的 sizeof

如何在 Python 中声明变量

Python 中的数据结构和算法 | 第一部分

使用 Python 求解线性方程

2023 年使用 Python 的世界级软件 IT 公司

Atom Python

Tabula Python

使用 Python 进行学生学业成绩预测

Python 中的 __future__ 模块

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器

Python 中的 future 模块