pandas

Question

假设我有一个 10 行的数据框，其中有两列 A 和 B，如下所示：

在 excel 中，我可以这样计算 rolling mean，但不包括第一行：

如何在 pandas 中执行此操作？

这是我试过的方法：

import pandas as pd

df = pd.read_clipboard() #copying the dataframe given above and calling read_clipboard will get the df populated
for i in range(1, len(df)):
    df.loc[i, 'B'] = df[['A', 'B']].loc[i-1].mean()

这给了我想要的匹配结果excel。但是有更好的 pandas 方法吗？我试过使用 expanding 和 rolling 没有产生预期的结果。

Answer 1

您有指数加权移动平均线，而不是简单移动平均线。这就是 pd.DataFrame.rolling 不起作用的原因。您可能正在寻找 pd.DataFrame.ewm。

从

开始

df

Out[399]: 
    A  B
0  21  6
1  87  0
2  87  0
3  25  0
4  25  0
5  14  0
6  79  0
7  70  0
8  54  0
9  35  0

df['B'] = df["A"].shift().fillna(df["B"]).ewm(com=1, adjust=False).mean()
df

Out[401]: 
    A          B
0  21   6.000000
1  87  13.500000
2  87  50.250000
3  25  68.625000
4  25  46.812500
5  14  35.906250
6  79  24.953125
7  70  51.976562
8  54  60.988281
9  35  57.494141

即使只有 10 行，这样做也会使代码速度提高约 10 倍 %timeit（从 10.3 毫秒到 959 微秒）。在 100 行上，这变成了 100 倍（1.1 毫秒对 110 毫秒）。

pandas - 指数加权移动平均线 - 类似于 excel

pandas - exponentially weighted moving average - similar to excel

python

mean