Python

Question

我需要列 cluster-1 和列 cluster-2 中最相似的（最大计数）。

Input - data

Output - data

我使用命令：df.groupby(['cluster-1','cluster-2'])['cluster-2'].count() 这个命令会给我在 cluster-2 列中每次出现的次数。我需要有关如何进行的建议，谢谢。

Answer 1

使用SeriesGroupBy.value_counts because by default sorted values, so possible convert MultiIndex to DataFrame by MultiIndex.to_frame and then remove duplicates by cluster-1 in DataFrame.drop_duplicates:

df1 = (df.groupby(['cluster-1'])['cluster-2']
         .value_counts()
         .index
         .to_frame(index=False)
         .drop_duplicates('cluster-1'))

Python - pandas，分组依据和最大计数

Python - pandas, group by and max count

group-by

pandas

pandas-groupby