如何使用 python 计算不同客户的特定月数后的平均值?

How to work out mean after specific number of months for different customers using python?

我有 dataframe 个客户,他们每个月的花费如下所示:

data =[['Armin',12,5,11,24,5,4,10,5],['Benji',10,12,10,32,4,18,0,0],['Casey',0,0,30,15,25,5,0,0]]

df = pd.DataFrame(data, columns = ['Name','2019-01','2019-02','2019-03','2019-04','2019-05','2019-06','2019-07','2019-08'])

我需要计算每个客户从指定月份开始的 3 个月平均值,如下面的数据框所示:

data2 = [['Armin','2019-04'],['Benji','2019-02'],['Casey','2019-03']]
df2 = pd.DataFrame(data2, columns = ['Name','Specified Month'])

因此对于 Armin,从他指定的月份开始的 3 个月平均值将是 (24 + 5 + 4)/3 = 11

预期结果将类似于以下内容:

df['Specified Average'] = [11,18,23.3]

首先通过 Index.get_indexerdf 中获得位置,然后 select 接下来的 3 个值 np.add.outer 并获得 mean:

N = 3
a = df.columns.get_indexer(df2['Specified Month'])

df2['Specified Average'] = (np.mean(df.values[np.arange(len(df)), 
                                             np.add.outer(np.arange(N), a)], axis=0)
                              .astype(float))
print (df2)
    Name Specified Month  Specified Average
0  Armin         2019-04          11.000000
1  Benji         2019-02          18.000000
2  Casey         2019-03          23.333333

另一个 pandas only 解决方案更通用 - 如果两个 DataFrame 之间不存在数据则工作,如果任何日期时间不存在接下来的 3 个月也工作:

s = (df.reset_index()
       .melt(id_vars=['Name','index'], var_name='Specified Month')
       .merge(df2, how='left', indicator=True)
       .assign(groups=lambda x: x['_merge'].eq('both').astype(int).groupby(x['Name']).cumsum())
       .query("groups != 0")
       .groupby('Name')
       .head(N)
       .sort_values('index')
       .groupby('Name', sort=False)['value']
       .mean()
         )

print (s)
Name
Armin    11.000000
Benji    18.000000
Casey    23.333333
Name: value, dtype: float64

df2['Specified Average'] = s.values
print (df2)
    Name Specified Month  Specified Average
0  Armin         2019-04          11.000000
1  Benji         2019-02          18.000000
2  Casey         2019-03          23.333333