Pandas 多索引DataFrame为每个索引添加子索引

Pandas Multi index DataFrame add subindex to each index

我有一个包含行“bar”和“baz”的多索引数据框,每一行都有一行“one”和“two”。我现在想在每一行“bar”和 foo 中添加一行“three”。

有什么优雅的方法吗?

例如:

import pandas as pd
import numpy as np

arrays = [["bar", "bar", "baz", "baz"],
          ["one", "two", "one", "two"]]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, 4), index=["A", "B", "C"], columns=index)
In [38]: df
Out[38]: 
first        bar                 baz          
second       one       two       one       two
A       0.357392 -1.880279  0.099014  1.354570
B       0.474572  0.442074 -1.173530 -1.362059
C      -0.980140 -0.173440 -1.490654 -0.539123

我想要这样的东西:

first        bar                           baz                    
second       one       two     three       one       two     three
A      -0.096890  0.012150       nan -0.749569 -0.965033       nan
B      -0.854206  0.118473       nan  0.263058 -0.025849       nan
C      -0.688007 -0.258569       nan  0.127305 -0.955044       nan

对于 general answer 当您不一定知道级别 0 中的索引名称并且通常希望对每个级别 0 索引执行此操作时:

首先,我们应该创建要注入的 NaN 矩阵。它有 len(df) 行数,对于列数,我们应该找到数据框中有多少个 0 级列。在我们创建它之后,我们将它变成一个具有与我们的 multindex 数据框相同的索引和列的数据框。请注意,对于此数据框,我们只需要原始数据框的 levels[0],因为对于下一个级别,我们想要 'three'.

a = np.full((len(df),len(df.columns.levels[0])), np.nan)

inject_df = pd.DataFrame(a, index=df.index, columns=pd.MultiIndex.from_product([df.columns.levels[0], ['three']]))
inject_df

first  bar     baz
       three   three
A      NaN     NaN
B      NaN     NaN
C      NaN     NaN

最后,我们将注入的 df 与原始 df 连接起来,并对索引进行排序,使共享 level(0) 个索引的 df 并排排列。

result = pd.concat([df, inject_df], axis=1).sort_index(level=0, axis=1)
result

first   bar                         baz
second  one    three    two         one        three    two
A    -0.995944  NaN   -0.437629    -0.629472    NaN    1.919711
B    -0.402886  NaN   0.262420      0.117202    NaN    -1.234542
C    1.281046   NaN   -1.058977     0.447767    NaN    2.374122

我不知道 Python 怎么样,但是有两种方法可以做到这一点:简单替换和使用插入。

  1. 换人
df[('bar','three')] = np.NaN
df[('baz','three')] = np.NaN 
  1. 插入
df.insert(2,('bar','three'),np.NaN)
df.insert(5,('baz','three'),np.NaN)
first                   bar                     baz
second  one     two     three   one     two     three
A   -0.973338   -0.233507   NaN     0.777288    -2.282688   NaN
B   -0.377486   0.080627    NaN     0.401302    0.355696    NaN
C   0.481056    0.651335    NaN     0.161145    1.001937    NaN