Pandas Dataframe.sample()

2025年3月17日 | 阅读 3 分钟

Pandas 的 sample() 用于从 DataFrame 中随机选择行和列。如果我们要从一个大型数据集构建模型，我们必须随机选择一个较小的数据样本，这可以通过 sample 函数完成。

语法

DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

参数

n: 这是一个可选参数，包含一个整数值，用于定义生成的随机行的数量。
frac: 这也是一个可选参数，包含浮点值，并返回 浮点值 * 数据帧值的长度。它不能与参数 n 一起使用。
replace: 包含布尔值。如果为 true，则返回带替换的样本。 replace 的默认值为 false。
weights: 这也是一个可选参数，包含字符串或类似 ndarray。默认值“None”导致相等的概率加权。
如果传递了 Series；它将与目标对象在索引上对齐。权重中未在采样对象中找到的索引值将被忽略，并且采样对象中未在权重中的索引值将被分配零权重。
如果传递了 DataFrame，当 axis =0 时；它将接受列的名称。
如果权重是 Series；那么，权重必须与被采样的轴的长度相同。
如果权重不等于 1；它将被归一化为总和为 1。
权重列中的缺失值被视为零。
权重列中不允许使用无限值。
random_state: 这也是一个可选参数，包含一个整数或 numpy.random.RandomState。如果该值为 int，则为随机数生成器或 numpy RandomState 对象提供种子。
axis: 这也是一个可选参数，包含整数或字符串值。 0 或 'row' 和 1 或 'column'。

返回值

它返回一个与调用者相同的类型的新对象，该对象包含从调用者对象随机采样的 n 个项目。

示例 1

import pandas as pd
info = pd.DataFrame({'data1': [2, 4, 8, 0],
'data2': [2, 0, 0, 0],
'data3': [10, 2, 1, 8]},
index=['John', 'Parker', 'Smith', 'William'])
info
info['data1'].sample(n=3, random_state=1)
info.sample(frac=0.5, replace=True, random_state=1)
info.sample(n=2, weights='data3', random_state=1)

输出

       data1    data2    data3
John     2	     2	     10
William	 0	     0	     8

示例 2

在本例中，我们取一个 csv 文件，并通过使用 sample 从 DataFrame 中提取随机行。

csv 文件名为 aa，其中包含以下数据集

让我们编写一个代码，从上述数据集中提取随机行

# importing pandas package 
import pandas as pd 
# define data frame from csv file  
data = pd.read_csv("aa.csv") 
 # randomly select one row  
row1 = data.sample(n = 1)   
# display row
row1
# randomly select another row 
row2 = data.sample(n = 2) 
# display  row
row2

输出

          Name         Hire Date    Salary      Leaves Remaining
2     Parker Chapman    02/21/14     45000.0      10
5     Michael Palin     06/28/13     66000.0      8

下一个主题DataFrame.shift()

Pandas Dataframe.sample()

语法

参数

返回值

示例 1

示例 2

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Pandas 教程

Pandas Series

Pandas DataFrame

Pandas 操作

数据操作

Pandas 速查表

Pandas 索引

Pandas NumPy

Pandas 时间序列

Pandas 绘图

杂项。

面试题

Pandas Dataframe.sample()

语法

参数

返回值

示例 1

示例 2

相关帖子

DataFrame.sum()

DataFrame.iterrows()

DataFrame.pivot_table()

DataFrame.rename()

DataFrame.hist()

DataFrame.aggregate()

DataFrame.transpose()

DataFrame.count()

DataFrame.drop_duplicates()

DataFrame.where()

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器