如何从 pandas 数据帧（余弦相似矩阵）中找到第二个最大值

Question

如何从 pandas 数据帧（余弦相似矩阵）中找到第二个最大值或最大值，其中索引！=列？我可以遍历每一列并执行 index!=column 但我相信有更好的方法...

import pandas as pd
cos = pd.DataFrame([
    [ 1.        ,  0.17404038,  0.36849397],
    [ 0.17404038,  1.        ,  0.20505339],
    [ 0.36849397,  0.20505339,  1.        ]
    ])
cos.columns = ['A', 'B', 'C']
cos.index = ['A', 'B', 'C']

cos 看起来像这样

    A           B           C
A   1.000000    0.174040    0.368494
B   0.174040    1.000000    0.205053
C   0.368494    0.205053    1.000000

排除值为 1 的单元格，我希望结果为

    Col1    Col2
0   A       C
1   B       C
2   C       A

我可以做这样的事情并获得第二个最大值而不是最大值吗？

results = cos.idxmax().reset_index()
results.columns = ['Col1', 'Col2']

results
    Col1    Col2
0   A       A
1   B       B
2   C       C

Answer 1

您可以将 1 替换为任意值，然后像以前一样调用 idxmax 和 reset_index：

In [140]:
cos.replace(1,np.NaN).idxmax().reset_index()

Out[140]:
  index  0
0     A  C
1     B  C
2     C  A

所以只是为了让它更有趣：

In [141]:
new_df = cos.replace(1,np.NaN).idxmax().reset_index()
new_df.columns=['Col1', 'Col2']
new_df

Out[141]:
  Col1 Col2
0    A    C
1    B    C
2    C    A

更新

如果要添加值，则可以调用 apply 并使用 new_df 值从 cos df:

执行查找

In [144]:
new_df['value'] = new_df.apply(lambda x: cos.loc[x['Col1'], x['Col2']], axis=1)
new_df

Out[144]:
  Col1 Col2     value
0    A    C  0.368494
1    B    C  0.205053
2    C    A  0.368494

事实上你可以使用lookup:

In [146]:
new_df['value'] = cos.lookup(new_df['Col1'], new_df['Col2'])
new_df

Out[146]:
  Col1 Col2     value
0    A    C  0.368494
1    B    C  0.205053
2    C    A  0.368494

Answer 2

为什么不使用排名方法来获取所有列的排名？

>>> ranking = cos.rank(ascending=False)
>>> ranking
   A  B  C
A  1  3  2
B  3  1  3
C  2  2  1

如何从 pandas 数据帧（余弦相似矩阵）中找到第二个最大值

How to find second max from pandas dataframe (cosine similarity matrix)

max

pandas

cosine-similarity