在 Pandas 我有一个数据框,其中几列定义了一个配置。我想识别具有相同配置的行

In Pandas I have a dataframe where several columns define a configuration. I want to identify the rows with identical configurations

df = pd.DataFrame({'id': [ 101, 102, 103, 104, 105, 106, 107 ],
                   'color': [ 'blue', 'blue', 'blue', 'red', 'blue', 'red', 'blue' ],
                   'location': ['there', 'here', 'there', 'here', 'here', 'there', 'here']})

df

输出[12]:

    id color location
0  101  blue    there
1  102  blue     here
2  103  blue    there
3  104   red     here
4  105  blue     here
5  106   red    there
6  107  blue     here

我想制作一个按颜色和位置分组的列,如下所示:

    id color location group
0  101  blue    there     A
1  102  blue     here     B
2  103  blue    there     A
3  104   red     here     C
4  105  blue     here     B
5  106   red    there     D
6  107  blue     here     B

看起来像 groupby().ngroup():

df['group'] = df.groupby(['color','location'], sort=False).ngroup()

输出:

    id color location  group
0  101  blue    there      0
1  102  blue     here      1
2  103  blue    there      0
3  104   red     here      2
4  105  blue     here      1
5  106   red    there      3
6  107  blue     here      1

我会做factorize

df[['color','location']].agg(','.join,1).factorize()[0]
Out[12]: array([0, 1, 0, 2, 1, 3, 1], dtype=int64)
#df['group']=df[['color','location']].agg(','.join,1).factorize()[0]