我怎样才能找到频率？

Question

我有这个数据框。如何找到 b 列中重复次数最多的 3 个数字？

import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2,3], "b": [np.nan, np.nan, '2,3', 3, '3,5,1',2]})

我猜答案应该是 3,2,5 或 3,2,1

Answer 1

split 分隔符 , 周围的列 b，然后使用 explode 将列表中的每个元素转换为行，最后使用 value_counts + head 获取前 3 个重复元素：

df['b'].dropna().astype(str).str.split(',')\
       .explode().value_counts().head(3).index.tolist()

explode 在 pandas 版本 >= 0.25 中可用，对于 pandas 版本 < 0.25 使用：

pd.value_counts(np.hstack(df['b'].dropna().astype(str).str.split(','))).head(3).index.tolist()

['3', '2', '5']

Answer 2

使用 pandas 和 python collections.Counter

的组合

from collections import Counter

a = list(dict(Counter(df.b.dropna().astype(str).str.split(',').sum()).most_common(3))
                                   .keys())

In [132]: a
Out[132]: ['3', '2', '5']

我怎样才能找到频率？

how can I find the frequency?

python

numpy

frequency

pandas