Python 零售成本优化

2024年12月12日 | 阅读时长 13 分钟

为了最大化销售额和利润，确定商品和服务的最佳销售成本非常重要。本教程适用于那些希望了解如何利用机器学习优化零售成本的人。在本教程中，我们将指导您完成使用机器学习进行 Python 零售成本优化的工作。

零售成本优化： 找到您为商品收取的成本与您可以以该成本销售的单位数量之间的理想平衡是优化零售定价的关键。

最终目标是设定一个能让您获得最大利润，同时又能吸引足够多客户购买您商品的定价。找到能最大化您的销售额和利润，同时保持客户满意度的最佳成本，包括使用信息和定价策略。

因此，您需要产品定价、服务成本以及所有影响产品成本的其他信息来完成零售成本优化过程。为此，我们找到了完美的数据集。

在以下部分，我们将引导您通过机器学习进行零售成本优化。

零售成本数据集

导入适当的 Python 库将使我们能够开始零售成本优化工作。在竞争激烈的零售行业中，定价对于吸引客户和提高盈利能力至关重要。定价决策是复杂的，受市场需求、竞争、目标利润率和商品销售成本 (COGS) 等多种因素的影响。为了在保持盈利能力的同时最大化销售额，企业必须优化其定价方法。

这是提交给 Kaggle 的一份信息，重点关注零售成本优化。

以下列出了所有信息的特征

product_id1：集合中每个商品的识别码。
product_category_name1：商品所属产品类别的名称。
month_year：信息记录的日期或零售交易的年份和月份。
Qty：在特定交易中销售或购买的产品数量。
total_cost1：商品成本加上任何相关税费或折扣的总和。
freight_cost：商品运输或运费的成本。
unit_cost：单个产品单位的成本。
product_name_length：产品名称中的总字符数。
product_description_length：产品描述中的字符数。
product_photos_qty：信息中存在的产品照片数量。
product_weight_g：商品的克重。
product_score：根据产品的受欢迎程度、质量或其他重要方面给出的评级或分数。
Customers：特定交易中商品的购买者总数。

主表（仅供参考）

product_id1	product_category_name1	month_year	qty	total_cost1	freight_cost	unit_cost
sofa1	sofa_bath_table	########	1	46.26	16.1	46.26
sofa1	sofa_bath_table	########	4	148.86	12.24444	46.26
sofa1	sofa_bath_table	########	6	286.8	14.84	46.26
sofa1	sofa_bath_table	########	4	184.8	14.2886	46.26
sofa1	sofa_bath_table	########	2	21.2	16.1	46.26
sofa1	sofa_bath_table	########	4	148.86	16.1	46.26
sofa1	sofa_bath_table	########	11	446.86	16.84284	41.6418
sofa1	sofa_bath_table	########	6	242.24	16.24	42.22
sofa1	sofa_bath_table	########	12	862.81	16.64468	42.22
sofa1	sofa_bath_table	########	18	812.82	14.84244	42.22
sofa1	sofa_bath_table	########	18	682.84	16.46246	42.22
sofa1	sofa_bath_table	########	14	612.88	14.24616	42.22
sofa1	sofa_bath_table	########	12	862.81	11.26642	42.22
sofa1	sofa_bath_table	########	6	122.26	14.228	42.22
sofa1	sofa_bath_table	########	8	412.22	21.4186	42.22
sofa1	sofa_bath_table	########	8	414.22	16.44486	42.24
garden6	garden_tools	########	6	412.4	42.68	62.2
garden6	garden_tools	########	4	248.2	44.21668	82.644
garden6	garden_tools	########	21	1266	42.8286	28.6882

其余字段将在下面讨论

weekday_1：指交易发生的星期几。
Weekend：一个二进制标记，指示交易是否发生在周末 (1)。
Holiday：此二进制标志指示交易是否发生在节假日 (1)。
Month：交易的时间范围。
Year：交易发生的年份。
s：季节性影响
comp_1、comp_2、comp_4：有关竞争对手报价、定价或其他相关要素的详细信息或变量。
ps1、ps2、ps4：与竞争对手商品相关的产品分数或评级。
fp1、fp2、fp4：与竞争对手商品相关的运费或运输成本

利用这些信息将使您能够创建以信息为驱动的定价优化计划，从而最大化收入。

主表续（仅供参考）

weekday_1	weekend	holiday	month	year	s	volume	comp_1	ps1	fp1
8	1	6	2118	11.26842	4811	82.2	4.2	16.1112
8	1	6	2118	6.614116	4811	82.2	4.2	14.86222
11	1	8	2118	12.18166	4811	82.2	4.2	14.22484
8	1	8	2118	2.224884	4811	82.2	4.2	14.66686
2	1	2	2118	6.666666	4811	82.2	4.2	18.88662
2	2	11	2118	8.444444	4811	82.2	4.2	21.68214
8	4	11	2118	41.66666	4811	82.2	4.2	16.224
11	1	12	2118	16.66668	4811	88.48824	4.2	18.82844
8	2	1	2118	18.86811	4811	86.2	4.2	12.48464
8	2	2	2118	16.82244	4811	86.2	4.2	12.21212
2	1	4	2118	16.88886	4811	86.2	4.2	12.28246
2	1	4	2118	12.14264	4811	86.146	4.2	12.24
8	4	6	2118	11.26842	4811	84.64262	4.2	16.88148
2	1	6	2118	6.614116	4811	82.2	4.2	24.11666
2	1	8	2118	12.18166	4811	88.24444	4.2	12.262
8	1	8	2118	2.224884	4811	84	4.2	18.26681
8	1	4	2118	2.118144	12666	62.2	4.1	42.68
11	2	4	2118	8.688681	12666	82.64444	4.1	44.21668
8	1	6	2118	12.14862	12666	62.2	4.1	12.8426

读取数据

源代码片段

import pandas as pdd
import plotly.express as pxx
import plotly.graph_objects as go
import plotly.io as pioq
pioq.templates.default  =  "plotly_white"

information  =  pdd.read_csv('retail_cost1.csv')
print(information.head())

输出

  product_id1 product_category_name1  month_year  qty  total_cost1  \
1       sofa1        sofa_bath_table  11-16-2118    1        46.26   
1       sofa1        sofa_bath_table  11-16-2118    4       148.86   
2       sofa1        sofa_bath_table  11-18-2118    6       286.81   
4       sofa1        sofa_bath_table  11-18-2118    4       184.81   
4       sofa1        sofa_bath_table  11-12-2118    2        21.21   

   freight_cost  unit_cost  product_name_lenght  product_description_lenght  \. . .
1      16.111111       46.26                   42                         161   
1      12.244444       46.26                   42                         161   
2      14.841111       46.26                   42                         161   
4      14.288611       46.26                   42                         161   
4      16.111111       46.26                   42                         161   
   product_photos_qty . . .  comp_1  ps1        fp1      comp_2  ps2  \. . .
1                   2  ...    82.2  4.2  16.111828  216.111111  4.4   
1                   2  ...    82.2  4.2  14.862216  212.111111  4.4   
2                   2  ...    82.2  4.2  14.224844  216.111111  4.4   
4                   2  ...    82.2  4.2  14.666868  122.612814  4.4   
4                   2  ...    82.2  4.2  18.886622  164.428811  4.4   
         fp2  comp_4  ps4        fp4  lag_cost  
1   8.861111   46.26  4.1  16.111111      46.21  
1  21.422111   46.26  4.1  12.244444      46.26  
2  22.126242   46.26  4.1  14.841111      46.26  
4  12.412886   46.26  4.1  14.288611      46.26  
4  24.424688   46.26  4.1  16.111111      46.26  
[6 rows x 41 cols]

在继续之前，让我们检查一下信息是否包含空值

源代码片段

输出

product_id1                    1
product_category_name1         1
month_year                    1
qty                           1
total_cost1                   1
freight_cost                 1
unit_cost                    1
product_name_lenght           1
product_description_lenght    1
product_photos_qty            1
product_weight_g              1
product_score                 1
customers                     1
weekday_1                       1
weekend                       1
holiday                       1
month                         1
year                          1
s                             1
volume                        1
comp_1                        1
ps1                           1
fp1                           1
comp_2                        1
ps2                           1
fp2                           1
comp_4                        1
ps4                           1
fp4                           1
lag_cost                     1
dtype: int64

现在让我们检查一下信息的描述性统计数据

源代码片段

输出

              qty   total_cost1  freight_cost  unit_cost  \
count  686.111111    686.111111     686.111111  686.111111   
mean    14.426662   1422.818828      21.682281  116.426811   
std     16.444421   1811.124111      11.181818   86.182282   
min      1.111111     12.211111       1.111111   12.211111   
26%      4.111111    444.811111      14.861212   64.211111   
61%     11.111111    818.821111      18.618482   82.211111   
86%     18.111111   1888.422611      22.814668  122.221111   
max    122.111111  12126.111111      82.861111  464.111111   

       product_name_lenght  product_description_lenght  product_photos_qty  \
count           686.111111                  686.111111          686.111111   
mean             48.821414                  868.422418            1.224184   
std               2.421816                  666.216116            1.421484   
min              22.111111                  111.111111            1.111111   
26%              41.111111                  442.111111            1.111111   
61%              61.111111                  611.111111            1.611111   
86%              68.111111                  214.111111            2.111111   
max              61.111111                 4116.111111            8.111111   

       product_weight_g  product_score   customers  ...      comp_1  \
count        686.111111     686.111111  686.111111  ...  686.111111   
mean        1848.428621       4.186614   81.128118  ...   82.462164   
std         2284.818484       1.242121   62.166661  ...   48.244468   
min          111.111111       4.411111    1.111111  ...   12.211111   
26%          448.111111       4.211111   44.111111  ...   42.211111   
61%          261.111111       4.111111   62.111111  ...   62.211111   
86%         1861.111111       4.211111  116.111111  ...  114.266642   
max         2861.111111       4.611111  442.111111  ...  442.211111   
              ps1         fp1      comp_2         ps2         fp2      comp_4  \
count  686.111111  686.111111  686.111111  686.111111  686.111111  686.111111   
mean     4.162468   18.628611   22.241182    4.124621   18.621644   84.182642   
std      1.121662    2.416648   42.481262    1.218182    6.424184   48.846882   
min      4.811111    1.126442   12.211111    4.411111    4.411111   12.211111   
26%      4.111111   14.826422   64.211111    4.111111   14.486111   64.886814   
61%      4.211111   16.618284   82.221111    4.211111   16.811866   62.211111   
86%      4.211111   12.842611  118.888882    4.211111   21.666248   22.221111   
max      4.611111   68.241111  442.211111    4.411111   68.241111  266.611111   
              ps4         fp4   lag_cost  
count  686.111111  686.111111  686.111111  
mean     4.112181   18.266118  118.422684  
std      1.244222    6.644266   86.284668  
min      4.611111    8.681111   12.861111  
26%      4.211111   16.142828   66.668861  
61%      4.111111   16.618111   82.211111  
86%      4.111111   12.448888  122.221111  
max      4.411111   57.231111  364.111111  

[8 rows x 29 cols]

现在让我们检查一下产品成本的分布情况

源代码片段

figure = pxx.histogram(information,   x = 'total_cost1', 
                   nbins = 21,    title = 'Distribution of Total Cost')
figure.show()

输出

现在让我们使用以下图表检查单位成本分布

源代码片段

figure = pxx.box(information,   y = 'unit_cost',  title = 'Box Plot of Unit Cost')
figure.show()

输出

现在让我们检查数量与总定价之间的相关性

源代码片段

figure = pxx.scatter(information,  x = 'qty',  y = 'total_cost1',   title = 'Quantity vs Total Cost', trendline = "ols") 
figure.show()

输出

因此，数量与总定价之间存在简单的关系。这意味着定价策略基于固定的单位成本，最终成本是数量乘以单位成本。

现在让我们检查不同产品类别的平均总定价

源代码片段

figure = pxx.bar(information, x = 'product_category_name1',    y = 'total_cost1', 
             title = 'Average Total Cost by Product Category)
figure.show()

输出

现在让我们使用箱线图检查 weekday_1 的总成本变化

源代码片段

figure = pxx.box(information, x = 'weekday_1',  y = 'total_cost1',  title = 'Box Plot of Total Cost by Weekday_1')
figure.show()

输出

现在让我们检查用于显示每个节假日总成本细分的箱线图

源代码片段

figure = pxx.box(information, x = 'holiday',  y = 'total_cost1',  title = 'Box Plot of Total Cost by Holiday')
figure.show() 

输出

现在让我们检查数值特征之间的关系

源代码片段

correlation_matrix  =  information.corr()
figure  =  go.Figure(go.Heatmap(x = correlation_matrix.cols,  y = correlation_matrix.cols,  z = correlation_matrix.values))
figure.upddate_layout(title = 'Correlation Heatmap of Numerical Features' )
figure.show()

输出

优化零售成本需要对竞争对手的定价策略进行全面审查。根据零售商的定位和策略，监控和衡量竞争对手的成本有助于找到专业定价的机会，无论是低于还是高于竞争对手。现在让我们确定每个产品类别的典型竞争成本差异

源代码片段

information['comp_cost_diff']  =  information['unit_cost'] - information['comp_1'] 
avg_cost_diff_by_category  =  information.groupby('product_category_name1')['comp_cost_diff'].mean().reset_index()
figure  =  pxx.bar(avg_cost_diff_by_category,   x = 'product_category_name1', 
             y = 'comp_cost_diff',     title = 'Average Competitor Cost Difference by Product Category)
figure.upddate_layout(
    xaxis_title = 'Product Category',
    yaxis_title = 'Average Competitor Cost Difference' )
figure.show()

输出

众所周知的优化方法

传统的营销人员大多凭直觉做出定价决策，很少关注消费者行为、市场趋势、促销、节假日的影响，以及这些因素如何影响商品对成本的敏感度。由于高计算能力的发展，使得能够分析大量信息，大多数公司正在利用大数据技术优化定价决策。这在确保实现最大清仓/收入/利润目标的同时，提供了更具竞争力的成本。

出于各种原因，确定最佳成本或折扣率可能很困难。一是定价方法的复杂设计，它通常包含需要优化的多个因素，例如成本列表、折扣和特别优惠。另一个因素是需要更有效地评估新的定价方法，因为需求和利润预测非常复杂。选择最佳建模和优化方法是该过程中的技术难题。

下面详细介绍了确定成本以最大化一个指标同时将另一个指标最小化的过程。它将考虑消费者行为、节假日、竞争对手定价、同类相食的影响、成本进步的有效性，最重要的是，如何确定这些因素的成本。例如，商店可能需要在夏季最大化冬季商品的销售额，同时保持利润率至少高于 21%。

为了帮助实施，我还提供了几个相关代码。

通常，在 PySpark 上尽可能多地编写代码以加快所有大数据操作的代码速度是有用的。许多 PySpark 库正在不断创建和增强。然而，预处理（常见计算和聚合）和通用线性建模库已经完全开发并且非常有用。由于 PySpark 目前缺乏完全测试的优化库来满足我们情况的需求，我们将在 PySpark 中完成初始阶段，并在 Python 中完成优化步骤。

对于没有 Spark 设置或信息量适中的用户，可以通过适当的语法调整在 Python 中实现相同的功能。

聚类

此阶段可以通过两种方式应用：1) 将具有可比产品适应性和客户行为的商店分组，以减少模型数量并解决信息稀疏性问题，或 2) 分析信息。因此，每个键都在一个商店集群中建模，而不是在商店中的每个键中建模。2) 如果计算能力不是问题，标识符组可以指导模型学习有关相关商店的信息。

为了根据现有信息确定集群，可以使用 k-means 聚类方法。

建模

通常建议采用成本弹性模型，因为其系数可用于创建优化方程，并确定是否为适当的特征赋予了适当的权重。

可以使用成本弹性计算单个百分比成本变化所需的数量。当您将销量对数绘制在成本对数上时，该系数提供了斜率。

在拟合模型之前，您应该标准化或归一化值并采取额外的预处理措施，以确保信息符合线性回归假设。

根据变量选择的必要性，您可以先测试一般线性回归 (OLS)，然后再转到岭/套索模型。

一旦您完成模型的选择和调整，模型产生的系数将为您提供方程的参数。

因此，您每个商品的最终方程将是这样的

Log (volume1) = cost elasticity * log (cost1) + β2 * (cannibal1_cost1) + β3 * (cannibal2_cost1) + β4 * (holidayflag2) + …. depending upon the variable importance       - - - - - - - -equation 1

使用机器学习进行零售成本优化模型

现在让我们使用机器学习训练一个模型来优化零售成本。我们可以为此问题训练一个自动化学习框架，如下所示

源代码片段

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
X1  =  information[['qty', 'unit_cost', 'comp_1', 
          'product_score', 'comp_cost_diff']]
y  =  information['total_cost1']
X1_train, X1_test, y_train, y_test  =  train_test_split(X1, y, 
                                                    test_size = 1.2,
                                                    random_state = 42)
# Train a linear regression model
model  =  DecisionTreeRegressor()
model.fit(X1_train, y_train)model.fit(X1_train, y_train)
Let's make some guesses now and compare the anticipated and actual retail pricing:
y_pred  =  model.predict(X1_test)
figure  =  go.Figure()
figure.add_trace(go.Scatter(x = y_test, y = y_pred, mode = 'markers', 
                         marker = dict(color = 'blue'), 
                         name = 'Predicted vs. Actual Retail Cost'))
figure.add_trace(go.Scatter(x = [min(y_test), max(y_test)], y = [min(y_test), max(y_test)], 
                         mode = 'lines', 
                         marker = dict(color = 'red'), 
                         name = 'Ideal Prediction'))
figure.upddate_layout(
    title = 'Predicted vs. Actual Retail Cost',
    xaxis_title = 'Actual Retail Cost',
    yaxis_title = 'Predicted Retail Cost' )
figure.show()

输出

使用 Python 优化零售成本的综合代码

import pandas as pdd
import plotly.express as pxx
import plotly.graph_objects as go
import plotly.io as pioq #install Plotly by pip install plotly
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Set the default template for Plotly
pioq.templates.default = "plotly_white"

# Read the dataset
information = pdd.read_csv('retail_cost1.csv')

# Print first few rows of the dataset
print(information.head())

# Check for missing values
print(information.isnull().sum())
# Descriptive statistics
print(information.describe())
# Histogram for 'total_cost1'
figure = pxx.histogram(information,
                       x='total_cost1',
                       nbins=21,
                       title='Distribution of Total Cost')
figure.show()

# Box plot for 'unit_cost'
figure = pxx.box(information,
                 y='unit_cost',
                 title='Box Plot of Unit Cost')
figure.show()

# Scatter plot for 'qty' vs 'total_cost1'
figure = pxx.scatter(information,
                     x='qty',
                     y='total_cost1',
                     title='Quantity vs Total Cost',
                     trendline="ols")
figure.show()

# Bar chart for 'product_category_name1' vs 'total_cost1'
figure = pxx.bar(information,
                 x='product_category_name1',
                 y='total_cost1',
                 title='Average Total Cost by Product Category')
figure.show()

# Box plot for 'weekday_1' vs 'total_cost1'
figure = pxx.box(information,
                 x='weekday_1',
                 y='total_cost1',
                 title='Box Plot of Total Cost by Weekday')
figure.show()

# Box plot for 'holiday' vs 'total_cost1'
figure = pxx.box(information,
                 x='holiday',
                 y='total_cost1',
                 title='Box Plot of Total Cost by Holiday')
figure.show()

# Correlation heatmap
correlation_matrix = information.corr()
figure = go.Figure(data=go.Heatmap(
    x=correlation_matrix.columns,
    y=correlation_matrix.columns,
    z=correlation_matrix.values,
    colorscale='Viridis'))
figure.update_layout(title='Correlation Heatmap of Numerical Features')
figure.show()

# Calculate competitor cost difference
information['comp_cost_diff'] = information['unit_cost'] - information['comp_1']

# Average cost difference by category
avg_cost_diff_by_category = information.groupby('product_category_name1')['comp_cost_diff'].mean().reset_index()

# Bar chart for average competitor cost difference
figure = pxx.bar(avg_cost_diff_by_category,
                 x='product_category_name1',
                 y='comp_cost_diff',
                 title='Average Competitor Cost Difference by Product Category')
figure.update_layout(
    xaxis_title='Product Category',
    yaxis_title='Average Competitor Cost Difference'
)
figure.show()

# Prepare data for model
X1 = information[['qty', 'unit_cost', 'comp_1', 'product_score', 'comp_cost_diff']]
y = information['total_cost1']

# Split the data into train and test sets
X1_train, X1_test, y_train, y_test = train_test_split(X1, y, test_size=0.2, random_state=42)

# Train a Decision Tree Regressor
model = DecisionTreeRegressor()
model.fit(X1_train, y_train)

# Make predictions and evaluate the model
y_pred = model.predict(X1_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

因此，这就是如何使用 Python 和机器学习来优化零售定价。

总结

零售成本优化的最终目标是设定一个能最大化您的利润，同时吸引足够多的客户群来支持您的业务的成本。找到能最大化您的收入和销售额，同时保持客户满意度的最佳成本，包括使用信息和定价方法。我希望您喜欢阅读这篇关于基于 Python 的机器学习零售定价优化的文章。

下一主题使用 Python 进行假新闻检测

Python 零售成本优化

零售成本数据集

读取数据

众所周知的优化方法

聚类

建模

同类相食

使用机器学习进行零售成本优化模型

使用 Python 优化零售成本的综合代码

总结

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

Python 零售成本优化

零售成本数据集

读取数据

众所周知的优化方法

聚类

建模

同类相食

使用机器学习进行零售成本优化模型

使用 Python 优化零售成本的综合代码

总结

相关帖子

如何在使用 Pandas 读取 CSV 文件时跳过行

Python 中的运算符重载是什么

Matplotlib 子图中的图例

Python Openssl 生成证书

Argparse vs Docopt vs Click - Python 命令行解析库比较

如何在 Python 中比较两个列表

get_window_size driver method - Selenium Python

PyQt5 QDoubleSpinBox - 获取最大可能值

编写 Python 模块

Python 程序计算复利

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器