Pandas - 创建具有唯一值的组
Pandas - Create groups with unique values in it
请问是否可以将pandas DataFrame 分成两列的值在组内唯一的组?名称列为主键。
输入:
| name | num_1 | num_2 |
|--------|-------|--------|
| name_1 | 5 | 12 |
| name_2 | 5 | 12 |
| name_3 | 5 | 12 |
| name_4 | 7 | 14 |
| name_5 | 7 | 14 |
| name_6 | 8 | 14 |
| name_7 | 8 | 14 |
| name_8 | 9 | 13 |
| name_9 | 9 | 13 |
| name_10| 9 | 13 |
输出:
| name | num_1 | num_2 | group_id |
|--------|-------|--------|----------|
| name_1 | 5 | 12 | 1 |
| name_2 | 5 | 12 | 2 |
| name_3 | 5 | 12 | 3 |
| name_4 | 7 | 14 | 1 |
| name_5 | 7 | 14 | 2 |
| name_6 | 8 | 14 | 3 |
| name_7 | 8 | 14 | 4 |
| name_8 | 9 | 13 | 1 |
| name_9 | 9 | 13 | 2 |
| name_10| 9 | 13 | 3 |
基本上num_1和num_2不能在群里重复。 pandas 有什么办法吗?
使用groupby
with Groupby.cumcount
:
In [1450]: df['group_id'] = df.groupby('num_2').cumcount() + 1
In [1451]: df
Out[1451]:
name num_1 num_2 group_id
0 name_1 5 12 1
1 name_2 5 12 2
2 name_3 5 12 3
3 name_4 7 14 1
4 name_5 7 14 2
5 name_6 8 14 3
6 name_7 8 14 4
7 name_8 9 13 1
8 name_9 9 13 2
9 name_10 9 13 3
请问是否可以将pandas DataFrame 分成两列的值在组内唯一的组?名称列为主键。
输入:
| name | num_1 | num_2 |
|--------|-------|--------|
| name_1 | 5 | 12 |
| name_2 | 5 | 12 |
| name_3 | 5 | 12 |
| name_4 | 7 | 14 |
| name_5 | 7 | 14 |
| name_6 | 8 | 14 |
| name_7 | 8 | 14 |
| name_8 | 9 | 13 |
| name_9 | 9 | 13 |
| name_10| 9 | 13 |
输出:
| name | num_1 | num_2 | group_id |
|--------|-------|--------|----------|
| name_1 | 5 | 12 | 1 |
| name_2 | 5 | 12 | 2 |
| name_3 | 5 | 12 | 3 |
| name_4 | 7 | 14 | 1 |
| name_5 | 7 | 14 | 2 |
| name_6 | 8 | 14 | 3 |
| name_7 | 8 | 14 | 4 |
| name_8 | 9 | 13 | 1 |
| name_9 | 9 | 13 | 2 |
| name_10| 9 | 13 | 3 |
基本上num_1和num_2不能在群里重复。 pandas 有什么办法吗?
使用groupby
with Groupby.cumcount
:
In [1450]: df['group_id'] = df.groupby('num_2').cumcount() + 1
In [1451]: df
Out[1451]:
name num_1 num_2 group_id
0 name_1 5 12 1
1 name_2 5 12 2
2 name_3 5 12 3
3 name_4 7 14 1
4 name_5 7 14 2
5 name_6 8 14 3
6 name_7 8 14 4
7 name_8 9 13 1
8 name_9 9 13 2
9 name_10 9 13 3