Pandas：按频率对另一列具有相同值的列进行排序

Question

我的数据框按 y 列 分组并按 y 列的 count 列 排序。

代码：

df['count'] = df.groupby(['y'])['y'].transform(pd.Series.value_counts)
df = df.sort('count', ascending=False)

输出：

x   y   count
1   a   4
3   a   4
2   a   4
1   a   4
2   c   3
1   c   3
2   c   3
2   b   2
1   b   2

现在，我想按 x 列 的频率对 y 列 上分组的相同值进行排序，如下所示：

预期输出：

x   y   count
1   a   4
1   a   4
2   a   4
3   a   4
2   c   3
2   c   3
1   c   3
2   b   2
1   b   2

Answer 1

您似乎需要 groupby 和 value_counts and then numpy.repeat 来将索引值按计数扩展到 DataFrame:

s = df.groupby('y', sort=False)['x'].value_counts()
#alternative
#s = df.groupby('y', sort=False)['x'].apply(pd.Series.value_counts)
print (s)
y  x
a  1    2
   2    1
   3    1
c  2    2
   1    1
b  1    1
   2    1
Name: x, dtype: int64

df1 = pd.DataFrame(np.repeat(s.index.values, s.values).tolist(), columns=['y','x'])
#change order of columns
df1 = df1.reindex_axis(['x','y'], axis=1)
print (df1)
   x  y
0  1  a
1  1  a
2  2  a
3  3  a
4  2  c
5  2  c
6  1  c
7  1  b
8  2  b

Answer 2

如果您使用的是不支持 df.sort_values 的旧版本。你可以使用：

df.sort(columns=['count','x'], ascending=[False,True])

Pandas：按频率对另一列具有相同值的列进行排序

Pandas: Sort the column on frequency by another column having same value grouped

python

sorting

frequency

pandas