根据另一行的值是否已被索引，在 pandas 中设置行索引

Question

我想通过 Pandas 完成的是：

假设我们有一个这样的 Pandas DataFrame：

     transaction_code
1    4373-36
2    3626-68
3    3626-68
4    3281-23
5    4721-44
...
101  6273-56
102  2836-78
103  1657-28
104  3281-23
105  5323-64

我想创建一个名为 'transaction_code_new_index' 的新列，它将包含与当前现有索引一样的索引，每当 transaction_code 重复时（即代码 6273-75 可能在中存在 3 次）它），我希望这些代码的索引对它们相同（即对于每个匹配 6273-75 的 transaction_code，它们的索引必须相同）

示例：

     transaction_code transaction_code_new_index
1    4373-36          1
2    3626-68          2
3    3626-68          2 (because 3626-68 has already been indexed before)
4    3281-23          3
5    4721-44          4
...
101  6273-56          100
102  2836-78          101
103  1657-28          102
104  3281-23          3 (because 3281-23 has already been indexed before)
105  5323-64          103

谢谢。

Answer 1

你可以取每组的最小索引。使用转换会将结果分配回相应的行。

df['new_index'] = df.groupby('transaction_code')['transaction_code'].transform(lambda x: x.index.min())

输出

  transaction_code  new_index
1          4373-36          1
2          3626-68          2
3          3626-68          2
4          3281-23          4
5          4721-44          5

根据另一行的值是否已被索引，在 pandas 中设置行索引

Set row index in pandas based on whether another row's value has already been indexed or not

python

indexing

duplicates

dataframe

pandas