Pandas 多索引DataFrame为每个索引添加子索引
Pandas Multi index DataFrame add subindex to each index
我有一个包含行“bar”和“baz”的多索引数据框,每一行都有一行“one”和“two”。我现在想在每一行“bar”和 foo 中添加一行“three”。
有什么优雅的方法吗?
例如:
import pandas as pd
import numpy as np
arrays = [["bar", "bar", "baz", "baz"],
["one", "two", "one", "two"]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, 4), index=["A", "B", "C"], columns=index)
In [38]: df
Out[38]:
first bar baz
second one two one two
A 0.357392 -1.880279 0.099014 1.354570
B 0.474572 0.442074 -1.173530 -1.362059
C -0.980140 -0.173440 -1.490654 -0.539123
我想要这样的东西:
first bar baz
second one two three one two three
A -0.096890 0.012150 nan -0.749569 -0.965033 nan
B -0.854206 0.118473 nan 0.263058 -0.025849 nan
C -0.688007 -0.258569 nan 0.127305 -0.955044 nan
对于 general answer
当您不一定知道级别 0 中的索引名称并且通常希望对每个级别 0 索引执行此操作时:
首先,我们应该创建要注入的 NaN
矩阵。它有 len(df)
行数,对于列数,我们应该找到数据框中有多少个 0 级列。在我们创建它之后,我们将它变成一个具有与我们的 multindex 数据框相同的索引和列的数据框。请注意,对于此数据框,我们只需要原始数据框的 levels[0]
,因为对于下一个级别,我们想要 'three'
.
a = np.full((len(df),len(df.columns.levels[0])), np.nan)
inject_df = pd.DataFrame(a, index=df.index, columns=pd.MultiIndex.from_product([df.columns.levels[0], ['three']]))
inject_df
first bar baz
three three
A NaN NaN
B NaN NaN
C NaN NaN
最后,我们将注入的 df 与原始 df 连接起来,并对索引进行排序,使共享 level(0)
个索引的 df 并排排列。
result = pd.concat([df, inject_df], axis=1).sort_index(level=0, axis=1)
result
first bar baz
second one three two one three two
A -0.995944 NaN -0.437629 -0.629472 NaN 1.919711
B -0.402886 NaN 0.262420 0.117202 NaN -1.234542
C 1.281046 NaN -1.058977 0.447767 NaN 2.374122
我不知道 Python 怎么样,但是有两种方法可以做到这一点:简单替换和使用插入。
- 换人
df[('bar','three')] = np.NaN
df[('baz','three')] = np.NaN
- 插入
df.insert(2,('bar','three'),np.NaN)
df.insert(5,('baz','three'),np.NaN)
first bar baz
second one two three one two three
A -0.973338 -0.233507 NaN 0.777288 -2.282688 NaN
B -0.377486 0.080627 NaN 0.401302 0.355696 NaN
C 0.481056 0.651335 NaN 0.161145 1.001937 NaN
我有一个包含行“bar”和“baz”的多索引数据框,每一行都有一行“one”和“two”。我现在想在每一行“bar”和 foo 中添加一行“three”。
有什么优雅的方法吗?
例如:
import pandas as pd
import numpy as np
arrays = [["bar", "bar", "baz", "baz"],
["one", "two", "one", "two"]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
df = pd.DataFrame(np.random.randn(3, 4), index=["A", "B", "C"], columns=index)
In [38]: df
Out[38]:
first bar baz
second one two one two
A 0.357392 -1.880279 0.099014 1.354570
B 0.474572 0.442074 -1.173530 -1.362059
C -0.980140 -0.173440 -1.490654 -0.539123
我想要这样的东西:
first bar baz
second one two three one two three
A -0.096890 0.012150 nan -0.749569 -0.965033 nan
B -0.854206 0.118473 nan 0.263058 -0.025849 nan
C -0.688007 -0.258569 nan 0.127305 -0.955044 nan
对于 general answer
当您不一定知道级别 0 中的索引名称并且通常希望对每个级别 0 索引执行此操作时:
首先,我们应该创建要注入的 NaN
矩阵。它有 len(df)
行数,对于列数,我们应该找到数据框中有多少个 0 级列。在我们创建它之后,我们将它变成一个具有与我们的 multindex 数据框相同的索引和列的数据框。请注意,对于此数据框,我们只需要原始数据框的 levels[0]
,因为对于下一个级别,我们想要 'three'
.
a = np.full((len(df),len(df.columns.levels[0])), np.nan)
inject_df = pd.DataFrame(a, index=df.index, columns=pd.MultiIndex.from_product([df.columns.levels[0], ['three']]))
inject_df
first bar baz
three three
A NaN NaN
B NaN NaN
C NaN NaN
最后,我们将注入的 df 与原始 df 连接起来,并对索引进行排序,使共享 level(0)
个索引的 df 并排排列。
result = pd.concat([df, inject_df], axis=1).sort_index(level=0, axis=1)
result
first bar baz
second one three two one three two
A -0.995944 NaN -0.437629 -0.629472 NaN 1.919711
B -0.402886 NaN 0.262420 0.117202 NaN -1.234542
C 1.281046 NaN -1.058977 0.447767 NaN 2.374122
我不知道 Python 怎么样,但是有两种方法可以做到这一点:简单替换和使用插入。
- 换人
df[('bar','three')] = np.NaN
df[('baz','three')] = np.NaN
- 插入
df.insert(2,('bar','three'),np.NaN)
df.insert(5,('baz','three'),np.NaN)
first bar baz
second one two three one two three
A -0.973338 -0.233507 NaN 0.777288 -2.282688 NaN
B -0.377486 0.080627 NaN 0.401302 0.355696 NaN
C 0.481056 0.651335 NaN 0.161145 1.001937 NaN