如何使用 pandas 比较同一行中单列的值与多列的值？

Question

我有一个如下所示的数据框：

np.random.seed(21) 
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B1', 'B2', 'B3'])
df['current_State'] = [df['B1'][0], df['B1'][1], df['B2'][2], df['B2'][3], df['B3'][4], df['B3'][5], df['B1'][6], df['B2'][7]]
df

我需要创建一个新列，其中包含 'current_State' 的值相同的列的名称，这是所需的输出：

我尝试了很多 apply 和 lambda 函数的组合，但都没有成功。非常欢迎任何帮助！

Answer 1

您可以将 current_State 列与所有其余列进行比较以创建布尔掩码，然后在 mask 上使用 idxmax 和 axis=1 来获取名称给定行中的值等于 current_State:

中对应值的列

c = 'current_State'
df['new_column'] = df.drop(c, 1).eq(df[c], axis=0).idxmax(1)

如果可能没有匹配值，我们可以改用：

c = 'current_State'
m = df.drop(c, 1).eq(df[c], axis=0)
df['new_column'] = m.idxmax(1).mask(~m.any(1))

>>> df

          A        B1        B2        B3  current_State new_column
0 -0.051964 -0.111196  1.041797 -1.256739      -0.111196         B1
1  0.745388 -1.711054 -0.205864 -0.234571      -1.711054         B1
2  1.128144 -0.012626 -0.613200  1.373688      -0.613200         B2
3  1.610992 -0.689228  0.691924 -0.448116       0.691924         B2
4  0.162342  0.257229 -1.275456  0.064004       0.064004         B3
5 -1.061857 -0.989368 -0.457723 -1.984182      -1.984182         B3
6 -1.476442  0.231803  0.644159  0.852123       0.231803         B1
7 -0.464019  0.697177  1.567882  1.178556       1.567882         B2

如何使用 pandas 比较同一行中单列的值与多列的值？

How to compare a value of a single column over multiple columns in the same row using pandas?

python

apply

pandas