Pandas：使用 interpolate() 插补 NaN

2024 年 8 月 29 日 | 5 分钟阅读

interpolate() 的基本用法

下面的 pandas.DataFrame 用于举例说明。

示例

import pandas as pd      #here, we are importing the pandas library as pd
import numpy as np         #here, we are importing the numpy library as np
data = pd.DataFrame({'col1': [0, np.nan, np.nan, 30, 40],
                   'col2': [np.nan, 10, 20, np.nan, np.nan],
                   'col3': [40, np.nan, np.nan, 70, 100]})
print(data)

输出

    col1 col2 col3
 0 0.0 NaN 40.0
 1 NaN 10.0 NaN
 2 NaN 20.0 NaN
 3 30.0 NaN 70.0
 4 40.0 NaN 100.0

默认情况下，对每个列进行前向填充。底部 NaN 的值会被重复填充，而顶部 NaN 则保持不变。

输入

输出

    col1 col2 col3
 0 0.0 NaN 40.0
 1 10.0 10.0 50.0
 2 20.0 20.0 60.0
 3 30.0 20.0 70.0
 4 40.0 20.0 100.0

行或列：Axis

如果 axis=1，则对每一列进行填充。最右侧的 NaN 会被重复填充，而最左侧的 NaN 保持不变。

输入

输出

    col1 col2 col3
 0 0.0 20.0 40.0
 1 NaN 10.0 10.0
 2 NaN 20.0 20.0
 3 30.0 50.0 70.0
 4 40.0 70.0 100.0

连续 NaN 的最大填充数：limit

如果 NaN 是连续的，您可以使用 limit 参数指定最大填充数。默认值为 None，表示所有连续的 NaN 都会被填充。

输入

输出

    col1 col2 col3
 0 0.0 NaN 40.0
 1 10.0 10.0 50.0
 2 NaN 20.0 NaN
 3 30.0 20.0 70.0
 4 4.0 NaN 100.0

填充方向：limit_direction

填充方向由 limit_direction 参数指定，可以是 'forward'、'backward' 或 'both'。

代码：forward

data = pd.DataFrame({'col1': [0, np.nan, np.nan, 30, 40],
                   'col2': [np.nan, 10, 20, np.nan, np.nan],
                   'col3': [40, np.nan, np.nan, 70, 100]})
print(data.interpolate(limit=1, limit_direction='forward'))

输出

    col1 col2 col3
 0 0.0 NaN 40.0
 1 10.0 10.0 50.0
 2 NaN 20.0 NaN
 3 30.0 20.0 70.0
 4 40.0 NaN 100.0

代码：backword

输出

    col1 col2 col3
 0 0.0 10.0 40.0
 1 NaN 10.0 NaN
 2 20.0 20.0 60.0
 3 30.0 NaN 70.0
 4 40.0 NaN 100.0

如上所述，默认情况下，顶部（或左侧）的 NaN 将保持不变，但如果设置 limit_direction='both'，则两端的 NaN 都会被填充。

代码：both

输出

    col1 col2 col3
 0 0.0 10.0 40.0
 1 10.0 10.0 50.0
 2 20.0 20.0 60.0
 3 30.0 20.0 70.0
 4 40.0 20.0 100.0

填充或外插或两者都填充：limit_area

您可以使用 limit_area 参数指定要填充的区域。

'inside'：仅填充
'outside'：仅外插
None (默认)：填充和外插

代码：'inside'

输出

    col1 col2 col3
 0 0.0 NaN 40.0
 1 10.0 10.0 50.0
 2 20.0 20.0 60.0
 3 30.0 NaN 70.0
 4 40.0 NaN 100.0

代码：'outside'

输出

    col1 col2 col3
 0 0.0 NaN 40.0
 1 NaN 10.0 NaN
 2 NaN 20.0 NaN
 3 30.0 20.0 70.0
 4 40.0 20.0 100.0

代码：'both'

输出

    col1 col2 col3
 0 0.0 10.0 40.0
 1 NaN 10.0 NaN
 2 NaN 20.0 NaN
 3 30.0 20.0 70.0
# 4 40.0 20.0 100.0

说明

请注意，为了方便起见使用了“外插”一词，但正如您从上面的结果中可以看到的，在（默认的）线性填充中，外部值是端点值的重复，而不是线性外插。在下面描述的样条插值中，外部值是外插而不是重复。

原地操作：inplace

与许多其他方法一样，您可以使用 inplace=True 来更新实际对象。

示例

data.interpolate(inplace=True)
print(data)

输出

    col1 col2 col3
 0 0.0 NaN 4.0
 1 1.0 1.0 5.0
 2 2.0 2.0 6.0
 3 3.0 2.0 7.0
 4 4.0 2.0 10.0

填充方法：strategy

填充方法由第一个参数 strategy 指定。默认值为 'linear'（线性填充）。

线性插值：linear, index, values

当 method='linear'（默认）时，会忽略索引，但当 method='index' 或 method='values' 时，会使用索引值进行填充。

示例

s = pd.Series([0, np.nan, np.nan, 3],
              index=[0, 4, 6, 8])
print(s)

输出

 0 0.0
 4 NaN
 6 NaN
 8 3.0
# dtype: float64

输入

输出

 0 0.0
 4 1.0
 6 2.0
 8 3.0
# dtype: float64

输入

输出

 0 0.00
 4 1.50
 6 2.25
 8 3.00
# dtype: float64

如果索引段是字符串，method='linear'（默认）是可行的，但如果 method='index' 或 method='values'，则会引发错误。

示例

s.index = list('abcd')
print(s)

输出

 a 0.0
 b NaN
 c NaN
 d 3.0
 dtype: float64

输入

s.index = list('abcd')
print(s.interpolate())

输出

 a 0.0
 b 1.0
 c 2.0
 d 3.0
# dtype: float64

使用现有值：ffill, pad, bfill, backfill

如果 method='ffill' 或 method='pad'，NaN 会用前一个现有值填充；如果 method='bfill' 或 method='backfill'，则用下一个现有值填充。

代码：'ffill'

s = pd.Series([np.nan, 1, np.nan, 2, np.nan])
print(s.interpolate('ffill'))

输出

 0 NaN
 1 1.0
 2 1.0
 3 2.0
 4 2.0
# dtype: float64

代码：'bfill'

输出

 0 1.0
 1 1.0
 2 2.0
 3 2.0
 4 NaN
# dtype: float64

如果 method='ffill', 'pad'，则应设置 limit_direction='forward'；如果 method='bfill', 'backfill'，则应设置 limit_direction='backward'。

输入

Error

ValueError: 'limit_direction' should be 'forward' for technique 'ffill'

输入

Error

ValueError: 'limit_direction' should be 'in reverse' for technique 'bfill'

您也可以通过 fillna() 方法的 strategy 参数来实现相同的功能。

Pandas：使用 fillna() 替换缺失值 (NaN)

s = pd.Series([np.nan, 1, np.nan, 2, np.nan])
print(s.fillna(method='ffill'))

输出

0 NaN
 1 1.0
 2 1.0
 3 2.0
 4 2.0
# dtype: float64

输入

输出

 0 1.0
 1 1.0
 2 2.0
 3 2.0
 4 NaN
# dtype: float64

样条插值：spline

如果 method='spline'，则进行样条插值。您需要指定 order 参数。

示例

s = pd.Series([0, 10, np.nan, np.nan, 4, np.nan],
              index=[0, 2, 5, 6, 8, 12])
print(s)

输出

 0 0.0
 2 10.0
 5 NaN
 6 NaN
 8 4.0
 12 NaN
# dtype: float64

输入

输出

 0 0.00
 2 10.00
 5 13.75
 6 12.00
 8 4.00
 12 - 30.00
# dtype: float64

样条插值通常使用索引。如果索引发生变化，结果也会发生变化。

示例

s.index = range(6)
print(s)

输出

 0 0.0
 1 10.0
 2 NaN
 3 NaN
 4 4.0
 5 NaN
# dtype: float64

输入

输出

 0 0.0
 1 10.0
 2 14.0
 3 12.0
 4 4.0
 5 - 10.0
 dtype: float64

因此，样条插值要求索引为数字。如果索引是字符串，则会引发错误。

示例

s.index = list('abcdef')
print(s)

输出

 a 0.0
 b 10.0
 c NaN
 d NaN
 e 4.0
 f NaN
dtype: float64

下一主题使用 Python 优化采购流程

Pandas：使用 interpolate() 插补 NaN

interpolate() 的基本用法

行或列：Axis

连续 NaN 的最大填充数：limit

填充方向：limit_direction

填充或外插或两者都填充：limit_area

原地操作：inplace

样条插值：spline

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

Pandas：使用 interpolate() 插补 NaN

interpolate() 的基本用法

行或列：Axis

连续 NaN 的最大填充数：limit

填充方向：limit_direction

填充或外插或两者都填充：limit_area

原地操作：inplace

样条插值：spline

相关帖子

Python 中 os.rename 和 shutil.move 的区别

如何在 Python 中解包字典

Pandas 中的 scatter() 图

Python 中的 any()

Python urllib 库

如何在 Matplotlib 中更改 "legend" 的位置

get_screenshot_as_file Driver Method - Selenium Python

Python 中的元字符

Python 中的三元运算符

Python Pathlib 模块

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器