对 Dataframe 的行应用权重公式 Pandas
Apply weight formula over rows of Dataframe Pandas
我下面有一个df1
。我将它复制到 df2
以保存 df1
;然后我使用 df3
来计算 df2
.
df2=df1.copy()
我想计算一个权重,例如 Weight(A) = Price(A) / Sum(row_Prices)
和 return 它低于价格 df2
例如对于每一行我得到 3 行数据,价格,标准和重量行。我还想计算该行的标准差,我想它的形式类似。
我试过了
df3 = df2.iloc[1:,1:].div(df2.iloc[1:,1:].sum(axis=1), axis=0)
获取权重然后打印 df3
但它不起作用。
为了每个日期得到 2 行,我尝试堆叠 .stack()
但我可能做错了。帮助!谢谢
A B C D E
2006-04-27 00:00:00
2006-04-28 00:00:00 69.62 69.62 6.518 65.09 69.62
2006-05-01 00:00:00 71.5 71.5 6.522 65.16 71.5
2006-05-02 00:00:00 72.34 72.34 6.669 66.55 72.34
2006-05-03 00:00:00 70.22 70.22 6.662 66.46 70.22
2006-05-04 00:00:00 68.32 68.32 6.758 67.48 68.32
2006-05-05 00:00:00 68 68 6.805 67.99 68
2006-05-08 00:00:00 67.88 67.88 6.768 67.56 67.88
我希望它能很好地输出:
A B C D E
2006-04-27 00:00:00
2006-04-28 00:00:00
price 69.62 69.62 6.518 65.09 69.62
weight
std
2006-05-01 00:00:00
price 71.5 71.5 6.522 65.16 71.5
weight
std
2006-05-02 00:00:00
price 72.34 72.34 6.669 66.55 72.34
weight
std
据我所知,没有一种简单快捷的方法可以实现您想要做的事情。
您需要计算所有数据,然后将其全部合并到使用多级索引的 DataFrame
中:
# Making weight/std DataFrames
cols = list('ABCDE')
weight = pd.DataFrame([df[col] / df.sum(axis=1) for col in df], index=cols).T
std = pd.DataFrame([df.std(axis=1) for col in df], index=cols).T
# Making MultiIndex DataFrame
mindex = pd.MultiIndex.from_product([['price', 'weight', 'std'], df.index])
new_df = pd.DataFrame(index=mindex, columns=cols)
# Inserting data
new_df.ix['price'] = df.values
new_df.ix['weight'] = weight.values
new_df.ix['std'] = std.values
# Swapping levels
new_df = new_df.swaplevel(0, 1).sort_index()
结果 new_df
应该看起来像这样:
2006-04-28 price 69.62 69.62 6.518 65.09 69.62
std 27.7829 27.7829 27.7829 27.7829 27.7829
weight 0.248228 0.248228 0.0232397 0.232076 0.248228
2006-05-01 price 71.5 71.5 6.522 65.16 71.5
std 28.4828 28.4828 28.4828 28.4828 28.4828
weight 0.249841 0.249841 0.0227897 0.227687 0.249841
2006-05-02 price 72.34 72.34 6.669 66.55 72.34
std 28.8308 28.8308 28.8308 28.8308 28.8308
weight 0.249243 0.249243 0.0229776 0.229294 0.249243
2006-05-03 price 70.22 70.22 6.662 66.46 70.22
std 28.0509 28.0509 28.0509 28.0509 28.0509
weight 0.247443 0.247443 0.0234758 0.234194 0.247443
2006-05-04 price 68.32 68.32 6.758 67.48 68.32
std 27.4399 27.4399 27.4399 27.4399 27.4399
weight 0.244701 0.244701 0.024205 0.241692 0.244701
2006-05-05 price 68 68 6.805 67.99 68
std 27.3661 27.3661 27.3661 27.3661 27.3661
weight 0.243907 0.243907 0.0244086 0.243871 0.243907
2006-05-08 price 67.88 67.88 6.768 67.56 67.88
std 27.2947 27.2947 27.2947 27.2947 27.2947
weight 0.244201 0.244201 0.0243481 0.24305 0.244201
附带说明一下,我不确定您要计算哪种标准,所以我只是假设它是行价格标准(每行的 single/repeated 值) .
我下面有一个df1
。我将它复制到 df2
以保存 df1
;然后我使用 df3
来计算 df2
.
df2=df1.copy()
我想计算一个权重,例如 Weight(A) = Price(A) / Sum(row_Prices)
和 return 它低于价格 df2
例如对于每一行我得到 3 行数据,价格,标准和重量行。我还想计算该行的标准差,我想它的形式类似。
我试过了
df3 = df2.iloc[1:,1:].div(df2.iloc[1:,1:].sum(axis=1), axis=0)
获取权重然后打印 df3
但它不起作用。
为了每个日期得到 2 行,我尝试堆叠 .stack()
但我可能做错了。帮助!谢谢
A B C D E
2006-04-27 00:00:00
2006-04-28 00:00:00 69.62 69.62 6.518 65.09 69.62
2006-05-01 00:00:00 71.5 71.5 6.522 65.16 71.5
2006-05-02 00:00:00 72.34 72.34 6.669 66.55 72.34
2006-05-03 00:00:00 70.22 70.22 6.662 66.46 70.22
2006-05-04 00:00:00 68.32 68.32 6.758 67.48 68.32
2006-05-05 00:00:00 68 68 6.805 67.99 68
2006-05-08 00:00:00 67.88 67.88 6.768 67.56 67.88
我希望它能很好地输出:
A B C D E
2006-04-27 00:00:00
2006-04-28 00:00:00
price 69.62 69.62 6.518 65.09 69.62
weight
std
2006-05-01 00:00:00
price 71.5 71.5 6.522 65.16 71.5
weight
std
2006-05-02 00:00:00
price 72.34 72.34 6.669 66.55 72.34
weight
std
据我所知,没有一种简单快捷的方法可以实现您想要做的事情。
您需要计算所有数据,然后将其全部合并到使用多级索引的 DataFrame
中:
# Making weight/std DataFrames
cols = list('ABCDE')
weight = pd.DataFrame([df[col] / df.sum(axis=1) for col in df], index=cols).T
std = pd.DataFrame([df.std(axis=1) for col in df], index=cols).T
# Making MultiIndex DataFrame
mindex = pd.MultiIndex.from_product([['price', 'weight', 'std'], df.index])
new_df = pd.DataFrame(index=mindex, columns=cols)
# Inserting data
new_df.ix['price'] = df.values
new_df.ix['weight'] = weight.values
new_df.ix['std'] = std.values
# Swapping levels
new_df = new_df.swaplevel(0, 1).sort_index()
结果 new_df
应该看起来像这样:
2006-04-28 price 69.62 69.62 6.518 65.09 69.62
std 27.7829 27.7829 27.7829 27.7829 27.7829
weight 0.248228 0.248228 0.0232397 0.232076 0.248228
2006-05-01 price 71.5 71.5 6.522 65.16 71.5
std 28.4828 28.4828 28.4828 28.4828 28.4828
weight 0.249841 0.249841 0.0227897 0.227687 0.249841
2006-05-02 price 72.34 72.34 6.669 66.55 72.34
std 28.8308 28.8308 28.8308 28.8308 28.8308
weight 0.249243 0.249243 0.0229776 0.229294 0.249243
2006-05-03 price 70.22 70.22 6.662 66.46 70.22
std 28.0509 28.0509 28.0509 28.0509 28.0509
weight 0.247443 0.247443 0.0234758 0.234194 0.247443
2006-05-04 price 68.32 68.32 6.758 67.48 68.32
std 27.4399 27.4399 27.4399 27.4399 27.4399
weight 0.244701 0.244701 0.024205 0.241692 0.244701
2006-05-05 price 68 68 6.805 67.99 68
std 27.3661 27.3661 27.3661 27.3661 27.3661
weight 0.243907 0.243907 0.0244086 0.243871 0.243907
2006-05-08 price 67.88 67.88 6.768 67.56 67.88
std 27.2947 27.2947 27.2947 27.2947 27.2947
weight 0.244201 0.244201 0.0243481 0.24305 0.244201
附带说明一下,我不确定您要计算哪种标准,所以我只是假设它是行价格标准(每行的 single/repeated 值) .