DataFrame组合
DataFrame combination
我正在处理一个包含多个索引的大型 multiIndex Dataframe,例如segment
、period
和 classification
以及带有结果的几列,例如Results1
、Results2
。 DataFrame consolidated_df
应该存储我所有的计算结果:
import pandas as pd
import numpy as np
segments = ['A', 'B', 'C']
periods = [1, 2]
classification = ['x', 'y']
index_constr = pd.MultiIndex.from_product(
[segments, periods, classification],
names=['Segment', 'Period', 'Classification'])
consolidated_df = pd.DataFrame(np.nan, index=index_constr,
columns=['Results1', 'Results2'])
print(consolidated_df)
(大DataFrame的)结构如下:
Results1 Results2
Segment Period Classification
A 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
B 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
C 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
我 运行 对我所有的 segments
(A
、B
和 C
)进行 for 循环以计算结果(已存储在 DataFrame 的列中使用单独的函数 calc_function
。
此函数 returns 一个与合并后的数据框具有完全相同格式的数据框 - 除了它一次只报告一个段(即它是合并后的数据框的一部分)。
示例:
index_result = pd.MultiIndex.from_product(
[['A'], periods, classification],
names=['Segment', 'Period', 'Classification'])
result_calc = pd.DataFrame(np.random.randn(4,2), index=index_result,
columns=['Results1', 'Results2'])
print(result_calc)
Results1 Results2
Segment Period Classification
A 1 x -1.568351 0.386250
y 0.679170 1.552551
2 x -1.190928 -0.765319
y 3.254929 1.436295
我尝试使用以下方法将结果 DataFrame 存储在合并的 DataFrame 中,但没有成功:
for segment in segments:
#calc_function returns a DataFrame that has the same structure as consolidated_df
consolidated_df.loc[idx[segment, :, :], :] = calc_function(segment)
有没有办法轻松地将较小的 DataFrame 集成到合并的 DataFrame 中?
使用上面的示例,仅 consolidated_df.ix['A'] = result_calc
怎么样?
(与consolidated_df.ix['A', :, :] = result_calc
相同)
print(consolidated_df)
Results1 Results2
Segment Period Classification
A 1 x 1.290466 0.228978
y -0.276959 0.735192
2 x 0.757339 -0.787502
y -0.609848 0.805773
B 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
C 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
我正在处理一个包含多个索引的大型 multiIndex Dataframe,例如segment
、period
和 classification
以及带有结果的几列,例如Results1
、Results2
。 DataFrame consolidated_df
应该存储我所有的计算结果:
import pandas as pd
import numpy as np
segments = ['A', 'B', 'C']
periods = [1, 2]
classification = ['x', 'y']
index_constr = pd.MultiIndex.from_product(
[segments, periods, classification],
names=['Segment', 'Period', 'Classification'])
consolidated_df = pd.DataFrame(np.nan, index=index_constr,
columns=['Results1', 'Results2'])
print(consolidated_df)
(大DataFrame的)结构如下:
Results1 Results2
Segment Period Classification
A 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
B 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
C 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
我 运行 对我所有的 segments
(A
、B
和 C
)进行 for 循环以计算结果(已存储在 DataFrame 的列中使用单独的函数 calc_function
。
此函数 returns 一个与合并后的数据框具有完全相同格式的数据框 - 除了它一次只报告一个段(即它是合并后的数据框的一部分)。
示例:
index_result = pd.MultiIndex.from_product(
[['A'], periods, classification],
names=['Segment', 'Period', 'Classification'])
result_calc = pd.DataFrame(np.random.randn(4,2), index=index_result,
columns=['Results1', 'Results2'])
print(result_calc)
Results1 Results2
Segment Period Classification
A 1 x -1.568351 0.386250
y 0.679170 1.552551
2 x -1.190928 -0.765319
y 3.254929 1.436295
我尝试使用以下方法将结果 DataFrame 存储在合并的 DataFrame 中,但没有成功:
for segment in segments:
#calc_function returns a DataFrame that has the same structure as consolidated_df
consolidated_df.loc[idx[segment, :, :], :] = calc_function(segment)
有没有办法轻松地将较小的 DataFrame 集成到合并的 DataFrame 中?
使用上面的示例,仅 consolidated_df.ix['A'] = result_calc
怎么样?
(与consolidated_df.ix['A', :, :] = result_calc
相同)
print(consolidated_df)
Results1 Results2
Segment Period Classification
A 1 x 1.290466 0.228978
y -0.276959 0.735192
2 x 0.757339 -0.787502
y -0.609848 0.805773
B 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
C 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN