如何使用 Python 遍历时间跨度并计算 Dataframe 中的某些值？

Question

我有如下数据集

data = {'ReportingDate':['2013/5/31','2013/5/31','2013/5/31','2013/5/31','2013/5/31','2013/5/31',
                         '2013/6/28','2013/6/28',
                         '2013/6/28','2013/6/28','2013/6/28'],
        'MarketCap':[' ',0.35,0.7,0.875,0.7,0.35,' ',1,1.5,0.75,1.25],
       'AUM':[3.5,3.5,3.5,3.5,3.5,3.5,5,5,5,5,5],
       'weight':[' ',0.1,0.2,0.25,0.2,0.1,' ',0.2,0.3,0.15,0.25]}
 
# Create DataFrame
df = pd.DataFrame(data)
df.set_index('Reporting Date',inplace=True)
df

只是 8000 行数据集的一个示例。

报告日期从 2013/5/31 到 2015/10/30。它包括上述期间所有月份的数据。但只有每个月的最后一天。每个月的第一行有两个缺失数据。我知道

每个月的权重总和等于1
权重*AUM 等于市值

我可以用下面这行来得到我想要的答案，只用一个月

a= (1-df["2013-5"].iloc[1:]['weight'].sum())
b= a* AUM
df.iloc[1,0]=b
df.iloc[1,2]=a

如何使用循环获取整个周期的数据？谢谢

Answer 1

使用pandas.DataFrame.groupby的一种方式：

# If whitespaces are indeed whitespaces, not nan
df = df.replace("\s+", np.nan, regex=True)

# If not already datatime series
df.index = pd.to_datetime(df.index)

s = df["weight"].fillna(1) - df.groupby(df.index.date)["weight"].transform(sum)
df["weight"] = df["weight"].fillna(s)
df["MarketCap"] = df["MarketCap"].fillna(s * df["AUM"])

注意：这假设日期始终只是最后一天，因此它相当于按年月分组。如果不是这样，请尝试：

s = df["weight"].fillna(1) - df.groupby(df.index.strftime("%Y%m"))["weight"].transform(sum)

输出：

               MarketCap  AUM  weight
ReportingDate                        
2013-05-31         0.350  3.5    0.10
2013-05-31         0.525  3.5    0.15
2013-05-31         0.700  3.5    0.20
2013-05-31         0.875  3.5    0.25
2013-05-31         0.700  3.5    0.20
2013-05-31         0.350  3.5    0.10
2013-06-28         0.500  5.0    0.10
2013-06-28         1.000  5.0    0.20
2013-06-28         1.500  5.0    0.30
2013-06-28         0.750  5.0    0.15
2013-06-28         1.250  5.0    0.25

如何使用 Python 遍历时间跨度并计算 Dataframe 中的某些值？

How To Iterate Over A Timespan and Calculate some Values in a Dataframe using Python?

python

for-loop

sum

date

slice