如何将多个数据框合并到一个多索引数据框,其中每个合并的数据框都成为第二级列
how to merge mulitple dataframes to one multiindex dataframe where each of merged dataframes becomes 2nd level column
我想要实现的(没有太多混乱)是合并 3 个不同的数据帧,每个数据帧具有相同的列和索引,但每个代表不同的类别。
df1
Children Movie enthusiast
household
06f32e6e45da385834dac983256d59f3 0.086158 NaN
0d1974107c6731989c762e96def73568 0.120285 0.187764
0fd4f3b4adf43682f08e693a905b7432 0.400000 0.114686
11e0057cdc8b8e1b1cdabfa8a092ea5f NaN 0.140000
120549af6977623bd01d77135a91a523 0.335238 0.192578
df2
Children Movie enthusiast
household
06f32e6e45da385834dac983256d59f3 1.0 0.0
0d1974107c6731989c762e96def73568 4.0 11.0
0fd4f3b4adf43682f08e693a905b7432 1.0 5.0
11e0057cdc8b8e1b1cdabfa8a092ea5f 0.0 2.0
120549af6977623bd01d77135a91a523 7.0 9.0
df3
Children Movie enthusiast
household
06f32e6e45da385834dac983256d59f3 nan nan
0d1974107c6731989c762e96def73568 0.138 0.037
0fd4f3b4adf43682f08e693a905b7432 nan 0.025
11e0057cdc8b8e1b1cdabfa8a092ea5f nan 0.153
120549af6977623bd01d77135a91a523 0.091 0.021
df_merged(手动填充,因此并非所有值都存在,但您明白了)
Children Movie enthusiast
df1 df2 df3 df1 df2 df3
household
06f32e6e45da385834dac983256d59f3 0.086158 1 NaN NaN NaN NaN
0d1974107c6731989c762e96def73568 0.120285 4 0.138 0.187764 NaN NaN
0fd4f3b4adf43682f08e693a905b7432 0.400000 1 NaN 0.114686 NaN NaN
11e0057cdc8b8e1b1cdabfa8a092ea5f NaN 0 NaN 0.140000 NaN NaN
120549af6977623bd01d77135a91a523 0.335238 7 0.091 0.192578 NaN NaN
我认为您需要 concat
with parameter keys
, then swaplevel
with sort_index
以获得所需的格式 MultiIndex
:
df = pd.concat([df1, df2, df3], keys=['df1','df2','df3'], axis=1)
.swaplevel(0,1,axis=1)
.sort_index(axis=1)
print (df)
Children Movie enthusiast \
df1 df2 df3 df1
06f32e6e45da385834dac983256d59f3 0.086158 1.0 NaN NaN
0d1974107c6731989c762e96def73568 0.120285 4.0 0.138 0.187764
0fd4f3b4adf43682f08e693a905b7432 0.400000 1.0 NaN 0.114686
11e0057cdc8b8e1b1cdabfa8a092ea5f NaN 0.0 NaN 0.140000
120549af6977623bd01d77135a91a523 0.335238 7.0 0.091 0.192578
household NaN NaN NaN NaN
Movie enthusiastnthusiast
df3 df2
06f32e6e45da385834dac983256d59f3 NaN 0.00
0d1974107c6731989c762e96def73568 0.037 11.00
0fd4f3b4adf43682f08e693a905b7432 0.025 5.00
11e0057cdc8b8e1b1cdabfa8a092ea5f 0.153 2.00
120549af6977623bd01d77135a91a523 0.021 9.01
household NaN NaN
我想要实现的(没有太多混乱)是合并 3 个不同的数据帧,每个数据帧具有相同的列和索引,但每个代表不同的类别。
df1
Children Movie enthusiast
household
06f32e6e45da385834dac983256d59f3 0.086158 NaN
0d1974107c6731989c762e96def73568 0.120285 0.187764
0fd4f3b4adf43682f08e693a905b7432 0.400000 0.114686
11e0057cdc8b8e1b1cdabfa8a092ea5f NaN 0.140000
120549af6977623bd01d77135a91a523 0.335238 0.192578
df2
Children Movie enthusiast
household
06f32e6e45da385834dac983256d59f3 1.0 0.0
0d1974107c6731989c762e96def73568 4.0 11.0
0fd4f3b4adf43682f08e693a905b7432 1.0 5.0
11e0057cdc8b8e1b1cdabfa8a092ea5f 0.0 2.0
120549af6977623bd01d77135a91a523 7.0 9.0
df3
Children Movie enthusiast
household
06f32e6e45da385834dac983256d59f3 nan nan
0d1974107c6731989c762e96def73568 0.138 0.037
0fd4f3b4adf43682f08e693a905b7432 nan 0.025
11e0057cdc8b8e1b1cdabfa8a092ea5f nan 0.153
120549af6977623bd01d77135a91a523 0.091 0.021
df_merged(手动填充,因此并非所有值都存在,但您明白了)
Children Movie enthusiast
df1 df2 df3 df1 df2 df3
household
06f32e6e45da385834dac983256d59f3 0.086158 1 NaN NaN NaN NaN
0d1974107c6731989c762e96def73568 0.120285 4 0.138 0.187764 NaN NaN
0fd4f3b4adf43682f08e693a905b7432 0.400000 1 NaN 0.114686 NaN NaN
11e0057cdc8b8e1b1cdabfa8a092ea5f NaN 0 NaN 0.140000 NaN NaN
120549af6977623bd01d77135a91a523 0.335238 7 0.091 0.192578 NaN NaN
我认为您需要 concat
with parameter keys
, then swaplevel
with sort_index
以获得所需的格式 MultiIndex
:
df = pd.concat([df1, df2, df3], keys=['df1','df2','df3'], axis=1)
.swaplevel(0,1,axis=1)
.sort_index(axis=1)
print (df)
Children Movie enthusiast \
df1 df2 df3 df1
06f32e6e45da385834dac983256d59f3 0.086158 1.0 NaN NaN
0d1974107c6731989c762e96def73568 0.120285 4.0 0.138 0.187764
0fd4f3b4adf43682f08e693a905b7432 0.400000 1.0 NaN 0.114686
11e0057cdc8b8e1b1cdabfa8a092ea5f NaN 0.0 NaN 0.140000
120549af6977623bd01d77135a91a523 0.335238 7.0 0.091 0.192578
household NaN NaN NaN NaN
Movie enthusiastnthusiast
df3 df2
06f32e6e45da385834dac983256d59f3 NaN 0.00
0d1974107c6731989c762e96def73568 0.037 11.00
0fd4f3b4adf43682f08e693a905b7432 0.025 5.00
11e0057cdc8b8e1b1cdabfa8a092ea5f 0.153 2.00
120549af6977623bd01d77135a91a523 0.021 9.01
household NaN NaN