使用 10 个前一个值和下一个值之间的平均值替换 pandas 数据框中的特定值

replace specific value in pandas dataframes using the mean between 10 previous and next values

假设我有以下数据框

df.Consumption

0        16.208
1        11.193
2         9.845
3         9.348
4         9.091
          ...  
19611     0.000
19612     0.000
19613     0.000
19614     0.000
19615     0.000
Name: Consumption, Length: 19616, dtype: float64

我想用前 10 个和后 10 个不是 0.00 的值的平均值替换 0 个值

有什么好的方法吗?我正在考虑使用 replace 和 interpolate 方法,但我看不出如何有效地编写它

这应该让你很接近。它利用了不计入平均值的空值,因此您可以用 nan 替换零,然后循环遍历。

我不确定没有按行应用的更好方法。

有些事情告诉我,做一个实际的循环,在每次迭代中更新 df 会给你略有不同的结果,因为你将在进行时输入空值,这将使前 10 个结果始终有一个值。

import pandas as pd
df = pd.DataFrame({'Consumption':[1,1,1,1,1,1,1,1,1,0,2,2,2,2,2,2,2,2,2,2]})
df.replace(0,np.nan, inplace=True)
df.update(df.apply(lambda x:np.mean(df.Consumption.iloc[max(x.name-10,0):]), axis=1).to_frame('Consumption'),overwrite=False)

输出

Consumption
0   1.000000
1   1.000000
2   1.000000
3   1.000000
4   1.000000
5   1.000000
6   1.000000
7   1.000000
8   1.000000
9   1.526316
10  2.000000
11  2.000000
12  2.000000
13  2.000000
14  2.000000
15  2.000000
16  2.000000
17  2.000000
18  2.000000
19  2.000000

您可以使用 Series.rolling() with center=True together with Rolling.mean() 来获取前一个值和下一个值的平均值。

如果您想从均值计算中排除 0,请将 0 替换为 NaN

设置 center=True 以便滚动 windows 查找上一个和下一个条目。

最后,将那些值为0的条目用.loc取平均值,如下:

n = 10     # check previous and next 10 entries

# rolling window size is (2n + 1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
                                     .rolling(n * 2 + 1, min_periods=1, center=True)
                                     .mean())

df.loc[df['Consumption'] == 0, 'Consumption'] = Consumption_mean

演示

使用较小的 window 大小 n = 3 来演示:

df


    Consumption
0        16.208
1        11.193
2         9.845
3         9.348
4         9.091
5         8.010
6         0.000              <====   target entry
7         7.100
8         0.000              <====   target entry
9         6.800
10        6.500
11        6.300
12        5.900
13        5.800
14        5.600

#n = 10     # check previous and next 10 entries
n = 3     # smaller window size for demo

# rolling window size is (2n + 1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
                                     .rolling(n * 2 + 1, min_periods=1, center=True)
                                     .mean())

# Update into a new column `Consumption_New` for demo purpose
df['Consumption_New'] = df['Consumption']    
df.loc[df['Consumption'] == 0, 'Consumption_New'] = Consumption_mean

演示结果:

print(df)

    Consumption  Consumption_New
0        16.208          16.2080
1        11.193          11.1930
2         9.845           9.8450
3         9.348           9.3480
4         9.091           9.0910
5         8.010           8.0100
6         0.000           8.0698   # 8.0698 = (9.348 + 9.091 + 8.01 + 7.1 + 6.8) / 5 with skipping 0.000 between 7.100 and 6.800
7         7.100           7.1000
8         0.000           6.9420   # 6.942 = (8.01 + 7.1 + 6.8 + 6.5 + 6.3) / 5 with skipping 0.000 between 8.010 and 7.100
9         6.800           6.8000
10        6.500           6.5000
11        6.300           6.3000
12        5.900           5.9000
13        5.800           5.8000
14        5.600           5.6000