获取两列之间出现次数最多的

Get the highest occurrence between 2 columns

我有 2 列,分别称为 decision1 和 decision2。

我想比较它们以获得 2 之间出现次数最多的结果,因此根据最大的一个,我在 decision1 或 decision2 中获得最高出现次数。到目前为止,我的尝试导致了这一点,但没有成功,因为我只是在 EACH 列中出现次数最多,而不是合并

 # weightage option
if args['weightage'] == "yes":
    attr1 = data['decision'].value_counts().idxmax  #highest occurrence in decision
    attr2 = data['decision2'].value_counts().idxmax #highest occurrence in decision2
    heaviest_attribute = data.groupby(['decision','decision2']).size()

理想情况下,我只需要在 attr1attr2 之间使用某种 max() 函数,但我不知道如何处理。

例如,给定这个 table

我想比较 decision1 和 decision2 列,就好像它们是一列一样,在这种情况下,预期输出将是 'Yes',因为它是最经常出现的值。

可能存在更优雅的解决方案...

df = pd.DataFrame({'decision': ['Yes', 'Maybe', 'Yes', 'Maybe', 'Yes'], 
                   'decision 2': ['No', 'No', 'Perhaps', 'Perhaps', 'unsure']})

d1 = dict(df.groupby('decision').count().loc[:, 'decision 2'])
d2 = dict(df.groupby('decision 2').count().loc[:, 'decision'])

d1.update(d2)

max(d1, key=d1.get)

这是一个简单的解决方案。

最好将内容转换为列表,查找列表中的最大出现次数很简单。

import pandas as pd
data = pd.DataFrame({'decision': ['yes', 'maybe', 'yes', 'maybe', 'yes'], 
               'decision 2': ['No', 'No', 'Perhaps', 'Perhaps', 'unsure']
            })
a = list(data['decision'])+list(data['decision 2'])
a = max(set(a), key=a.count)
print(a)

输出:

yes

使用DataFrame.melt with Series.mode and select first value by position with Series.iat:

a = df[['decision','decision 2']].melt()['value'].mode().iat[0]

或通过DataFrame.stack重塑:

a = df[['decision','decision 2']].stack().mode().iat[0]

print (a)
Yes

详情:

print (df[['decision','decision 2']].melt()['value'])
0        Yes
1      Maybe
2        Yes
3      Maybe
4        Yes
5         No
6         No
7    Perhaps
8    Perhaps
9     unsure
Name: value, dtype: object
print (df[['decision','decision 2']].stack())
0  decision          Yes
   decision 2         No
1  decision        Maybe
   decision 2         No
2  decision          Yes
   decision 2    Perhaps
3  decision        Maybe
   decision 2    Perhaps
4  decision          Yes
   decision 2     unsure
dtype: object

编辑:

s = df.eq(a).any()

col = s.index[s][0]
print (col)
decision