Python pandas: 在已有列的基础上添加索引列，重复项共享相同的索引

Question

我想在现有列的基础上添加一个索引列。重复项将共享相同的索引。例如，

enter image description here

如果 ['old_index','year'] 两列的值相同，则新索引将相同。 'num' 列中的值无关紧要。

我想知道是否有人可以提供帮助。非常感谢！

Answer 1


df['new_id'] = df.groupby(df.columns.tolist(), sort=False).ngroup() + 1
df


index   year    id  new_id
0   1   2000    5   1
1   2   1996    3   2
2   2   1996    3   2
3   4   1994    2   3
4   4   1999    4   4
5   4   1999    4   4
6   12  1989    1   5
7   12  1989    1   5
8   12  1985    0   6
9   12  2011    6   7

试一试，如果它不完全符合您的要求，请告诉我。

Python pandas: 在已有列的基础上添加索引列，重复项共享相同的索引

Python pandas: add index column based on existing columns, with duplicates sharing the same index

python

dataframe

pandas

data-science