DataFrame组合

Question

我正在处理一个包含多个索引的大型 multiIndex Dataframe，例如segment、period 和 classification 以及带有结果的几列，例如Results1、Results2。 DataFrame consolidated_df 应该存储我所有的计算结果：

import pandas as pd
import numpy as np

segments = ['A', 'B', 'C']
periods = [1, 2]
classification = ['x', 'y']

index_constr = pd.MultiIndex.from_product(
    [segments, periods, classification],
    names=['Segment', 'Period', 'Classification'])

consolidated_df = pd.DataFrame(np.nan, index=index_constr,
                                       columns=['Results1', 'Results2'])

print(consolidated_df)

（大DataFrame的）结构如下：

                               Results1  Results2
Segment Period Classification                    
A       1      x                    NaN       NaN
               y                    NaN       NaN
        2      x                    NaN       NaN
               y                    NaN       NaN
B       1      x                    NaN       NaN
               y                    NaN       NaN
        2      x                    NaN       NaN
               y                    NaN       NaN
C       1      x                    NaN       NaN
               y                    NaN       NaN
        2      x                    NaN       NaN
               y                    NaN       NaN

我运行对我所有的 segments（A、B 和 C）进行 for 循环以计算结果（已存储在 DataFrame 的列中使用单独的函数 calc_function。

此函数 returns 一个与合并后的数据框具有完全相同格式的数据框 - 除了它一次只报告一个段（即它是合并后的数据框的一部分）。

示例：

index_result = pd.MultiIndex.from_product(
    [['A'], periods, classification],
    names=['Segment', 'Period', 'Classification'])

result_calc = pd.DataFrame(np.random.randn(4,2), index=index_result, 
     columns=['Results1', 'Results2'])

print(result_calc)

                               Results1  Results2
Segment Period Classification                    
A       1      x              -1.568351  0.386250
               y               0.679170  1.552551
        2      x              -1.190928 -0.765319
               y               3.254929  1.436295

我尝试使用以下方法将结果 DataFrame 存储在合并的 DataFrame 中，但没有成功：

for segment in segments:
#calc_function returns a DataFrame that has the same structure as consolidated_df
    consolidated_df.loc[idx[segment, :, :], :] = calc_function(segment)

有没有办法轻松地将较小的 DataFrame 集成到合并的 DataFrame 中？

Answer 1

使用上面的示例，仅 consolidated_df.ix['A'] = result_calc 怎么样？

（与consolidated_df.ix['A', :, :] = result_calc相同）

print(consolidated_df)

                               Results1  Results2
Segment Period Classification                    
A       1      x               1.290466  0.228978
               y              -0.276959  0.735192
        2      x               0.757339 -0.787502
               y              -0.609848  0.805773
B       1      x                    NaN       NaN
               y                    NaN       NaN
        2      x                    NaN       NaN
               y                    NaN       NaN
C       1      x                    NaN       NaN
               y                    NaN       NaN
        2      x                    NaN       NaN
               y                    NaN       NaN

DataFrame组合

DataFrame combination

python

merge

multi-index

dataframe

pandas