Pandas 将新的二级列添加到基于其他列的多索引列
Pandas add new second level column to column multiindex based on other columns
我有一个包含列多索引的 DataFrame:
System A B
Trial Exp1 Exp2 Exp1 Exp2
1 NaN 1 2 3
2 4 5 NaN NaN
3 6 NaN 7 8
结果表明,对于每个系统 (A, B
) 和每个测量值(索引中的 1, 2, 3
),Exp1
的结果始终优于 Exp2
。所以我想为每个系统生成一个第 3 列,称之为 Final
,只要可用就应该采用 Exp1
,否则默认为 Exp2
。想要的结果是
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1 1 2 3 2
2 4 5 4 NaN NaN NaN
3 6 NaN 6 7 8 7
最好的方法是什么?
我尝试在列上使用 groupby
:
grp = df.groupby(level=0, axis=1)
并且正在考虑使用 transform
或 apply
结合 assign
来实现它。但是我找不到可行或有效的方法。具体来说,出于效率原因,我正在避免本机 python for
循环(否则问题微不足道)。
感觉不是特别理想,但试试这个:
for system in df.columns.levels[0]:
df[(system, 'final')] = df[(system, 'Exp1')].fillna(df[(system, 'Exp2')])
stack
与您的第一级列索引 stack(0)
离开 ['Exp1', 'Exp2']
在列索引
- 使用
lambda
函数,该函数在 assign
调用中应用于整个数据帧。
- 最后,
unstack
、swaplevel
、sort_index
把它清理干净,放回原处。
f = lambda x: x.Exp1.fillna(x.Exp2)
df.stack(0).assign(Final=f).unstack() \
.swaplevel(0, 1, 1).sort_index(1)
A B
Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
使用 xs
的另一个概念
d1 = df.xs('Exp1', 1, 1).fillna(df.xs('Exp2', 1, 1))
d1.columns = [d1.columns, ['Final'] * len(d1.columns)]
pd.concat([df, d1], axis=1).sort_index(1)
A B
Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
使用stack
for reshape, add column with fillna
and then reshape back by unstack
with swaplevel
+ sort_index
:
df = df.stack(level=0)
df['Final'] = df['Exp1'].fillna(df['Exp1'])
df = df.unstack().swaplevel(0,1,axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 NaN 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
xs
for select DataFrames
, create new DataFrame
by combine_first
, but there is missing second level - was added by MultiIndex.from_product
and last concat
和 DataFrames
在一起的另一个解决方案:
a = df.xs('Exp1', axis=1, level=1)
b = df.xs('Exp2', axis=1, level=1)
df1 = a.combine_first(b)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['Final']])
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
与rename
类似的解决方案:
a = df.xs('Exp1', axis=1, level=1, drop_level=False)
b = df.xs('Exp2', axis=1, level=1, drop_level=False)
df1 = a.rename(columns={'Exp1':'Final'}).combine_first(b.rename(columns={'Exp2':'Final'}))
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
我有一个包含列多索引的 DataFrame:
System A B
Trial Exp1 Exp2 Exp1 Exp2
1 NaN 1 2 3
2 4 5 NaN NaN
3 6 NaN 7 8
结果表明,对于每个系统 (A, B
) 和每个测量值(索引中的 1, 2, 3
),Exp1
的结果始终优于 Exp2
。所以我想为每个系统生成一个第 3 列,称之为 Final
,只要可用就应该采用 Exp1
,否则默认为 Exp2
。想要的结果是
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1 1 2 3 2
2 4 5 4 NaN NaN NaN
3 6 NaN 6 7 8 7
最好的方法是什么?
我尝试在列上使用 groupby
:
grp = df.groupby(level=0, axis=1)
并且正在考虑使用 transform
或 apply
结合 assign
来实现它。但是我找不到可行或有效的方法。具体来说,出于效率原因,我正在避免本机 python for
循环(否则问题微不足道)。
感觉不是特别理想,但试试这个:
for system in df.columns.levels[0]:
df[(system, 'final')] = df[(system, 'Exp1')].fillna(df[(system, 'Exp2')])
stack
与您的第一级列索引stack(0)
离开['Exp1', 'Exp2']
在列索引- 使用
lambda
函数,该函数在assign
调用中应用于整个数据帧。 - 最后,
unstack
、swaplevel
、sort_index
把它清理干净,放回原处。
f = lambda x: x.Exp1.fillna(x.Exp2)
df.stack(0).assign(Final=f).unstack() \
.swaplevel(0, 1, 1).sort_index(1)
A B
Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
使用 xs
d1 = df.xs('Exp1', 1, 1).fillna(df.xs('Exp2', 1, 1))
d1.columns = [d1.columns, ['Final'] * len(d1.columns)]
pd.concat([df, d1], axis=1).sort_index(1)
A B
Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
使用stack
for reshape, add column with fillna
and then reshape back by unstack
with swaplevel
+ sort_index
:
df = df.stack(level=0)
df['Final'] = df['Exp1'].fillna(df['Exp1'])
df = df.unstack().swaplevel(0,1,axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 NaN 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
xs
for select DataFrames
, create new DataFrame
by combine_first
, but there is missing second level - was added by MultiIndex.from_product
and last concat
和 DataFrames
在一起的另一个解决方案:
a = df.xs('Exp1', axis=1, level=1)
b = df.xs('Exp2', axis=1, level=1)
df1 = a.combine_first(b)
df1.columns = pd.MultiIndex.from_product([df1.columns, ['Final']])
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0
与rename
类似的解决方案:
a = df.xs('Exp1', axis=1, level=1, drop_level=False)
b = df.xs('Exp2', axis=1, level=1, drop_level=False)
df1 = a.rename(columns={'Exp1':'Final'}).combine_first(b.rename(columns={'Exp2':'Final'}))
df = pd.concat([df, df1], axis=1).sort_index(axis=1)
print (df)
System A B
Trial Exp1 Exp2 Final Exp1 Exp2 Final
1 NaN 1.0 1.0 2.0 3.0 2.0
2 4.0 5.0 4.0 NaN NaN NaN
3 6.0 NaN 6.0 7.0 8.0 7.0