在 pandas 中重置循环多重索引

Question

我在 python 中有一个 pandas 数据框，来自 pd.concat 和一个循环多索引：

        customer_id
0   0   46841769
    1   4683936
1   0   8880872
    1   8880812
0   0   8880873
    1   1000521
1   0   1135488
    1   5388773

不，我只会重置 multiIndex 的第一个索引，以便在索引上得到一个循环编号。像这样：

        customer_id
0   0   46841769
    1   4683936
1   0   8880872
    1   8880812
2   0   8880873
    1   1000521
3   0   1135488
    1   5388773

一般来说，我有大约 5 个 Mio 记录，而不是最大的机器。所以我正在为此寻找一种内存高效的解决方案。

ignore_index=True 在 pd.concat 中不起作用，因为那样我就失去了 Multiindex。

非常感谢

Answer 1

您可以通过get_level_values to_series, then compare it with shifted values and add cumsum for count and last use MultiIndex.from_arrays转换第一级：

a = df.index.get_level_values(0).to_series()
a = a.ne(a.shift()).cumsum() - 1

mux = pd.MultiIndex.from_arrays([a, df.index.get_level_values(1)], names=df.index.names)

df.index = mux

或者：

df = df.set_index(mux)

print (df)
     customer_id
0 0     46841769
  1      4683936
1 0      8880872
  1      8880812
2 0      8880873
  1      1000521
3 0      1135488
  1      5388773

在 pandas 中重置循环多重索引

reset a recurring multiindex in pandas

python

multi-index

pandas