使用 10 个前一个值和下一个值之间的平均值替换 pandas 数据框中的特定值

Question

假设我有以下数据框

df.Consumption

0        16.208
1        11.193
2         9.845
3         9.348
4         9.091
          ...  
19611     0.000
19612     0.000
19613     0.000
19614     0.000
19615     0.000
Name: Consumption, Length: 19616, dtype: float64

我想用前 10 个和后 10 个不是 0.00 的值的平均值替换 0 个值

有什么好的方法吗？我正在考虑使用 replace 和 interpolate 方法，但我看不出如何有效地编写它

Answer 1

这应该让你很接近。它利用了不计入平均值的空值，因此您可以用 nan 替换零，然后循环遍历。

我不确定没有按行应用的更好方法。

有些事情告诉我，做一个实际的循环，在每次迭代中更新 df 会给你略有不同的结果，因为你将在进行时输入空值，这将使前 10 个结果始终有一个值。

import pandas as pd
df = pd.DataFrame({'Consumption':[1,1,1,1,1,1,1,1,1,0,2,2,2,2,2,2,2,2,2,2]})
df.replace(0,np.nan, inplace=True)
df.update(df.apply(lambda x:np.mean(df.Consumption.iloc[max(x.name-10,0):]), axis=1).to_frame('Consumption'),overwrite=False)

输出

Consumption
0   1.000000
1   1.000000
2   1.000000
3   1.000000
4   1.000000
5   1.000000
6   1.000000
7   1.000000
8   1.000000
9   1.526316
10  2.000000
11  2.000000
12  2.000000
13  2.000000
14  2.000000
15  2.000000
16  2.000000
17  2.000000
18  2.000000
19  2.000000

Answer 2

您可以使用 Series.rolling() with center=True together with Rolling.mean() 来获取前一个值和下一个值的平均值。

如果您想从均值计算中排除 0，请将 0 替换为 NaN。

设置 center=True 以便滚动 windows 查找上一个和下一个条目。

最后，将那些值为0的条目用.loc取平均值，如下：

n = 10     # check previous and next 10 entries

# rolling window size is (2n + 1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
                                     .rolling(n * 2 + 1, min_periods=1, center=True)
                                     .mean())

df.loc[df['Consumption'] == 0, 'Consumption'] = Consumption_mean

演示

使用较小的 window 大小 n = 3 来演示：

df


    Consumption
0        16.208
1        11.193
2         9.845
3         9.348
4         9.091
5         8.010
6         0.000              <====   target entry
7         7.100
8         0.000              <====   target entry
9         6.800
10        6.500
11        6.300
12        5.900
13        5.800
14        5.600

#n = 10     # check previous and next 10 entries
n = 3     # smaller window size for demo

# rolling window size is (2n + 1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
                                     .rolling(n * 2 + 1, min_periods=1, center=True)
                                     .mean())

# Update into a new column `Consumption_New` for demo purpose
df['Consumption_New'] = df['Consumption']    
df.loc[df['Consumption'] == 0, 'Consumption_New'] = Consumption_mean

演示结果：

print(df)

    Consumption  Consumption_New
0        16.208          16.2080
1        11.193          11.1930
2         9.845           9.8450
3         9.348           9.3480
4         9.091           9.0910
5         8.010           8.0100
6         0.000           8.0698   # 8.0698 = (9.348 + 9.091 + 8.01 + 7.1 + 6.8) / 5 with skipping 0.000 between 7.100 and 6.800
7         7.100           7.1000
8         0.000           6.9420   # 6.942 = (8.01 + 7.1 + 6.8 + 6.5 + 6.3) / 5 with skipping 0.000 between 8.010 and 7.100
9         6.800           6.8000
10        6.500           6.5000
11        6.300           6.3000
12        5.900           5.9000
13        5.800           5.8000
14        5.600           5.6000

使用 10 个前一个值和下一个值之间的平均值替换 pandas 数据框中的特定值

replace specific value in pandas dataframes using the mean between 10 previous and next values

python

dataframe

pandas

imputation