使用嵌套字典创建多索引“DataFrame”
Creating a multiindexed `DataFrame` with a nested dictionary
此问题与 有关。这次我想更进一步。给定一个像这样的字典:
dd = {0: {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}},
1: {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}}}
或像这样的列表:
ll = [{"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}},
{"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}}]
我想构建一个 DataFrame
像:
russell godel cantor
score ping score ping score ping
0 0.17473916938994682 40 0.3443303845926545 47 0.43576522521017247 42
1 0.7341005512329682 22 0.14682222267827938 81 0.5662517436162526 59
我们可以看到列索引是 MultiIndex
。有没有办法做到这一点?如果我尝试 pandas.DataFrame.from_dict(dd, orient="index")
或 pandas.DataFrame(ll)
然后我得到:
russell godel cantor
0 {'score': 0.17473916938994682, 'ping': 40} {'score': 0.3443303845926545, 'ping': 47} {'score': 0.43576522521017247, 'ping': 42}
1 {'score': 0.7341005512329682, 'ping': 22} {'score': 0.14682222267827938, 'ping': 81} {'score': 0.5662517436162526, 'ping': 59}
这不是我想要的。
现在比较复杂,但是Panel
with transpose
, to_frame
and unstack
可以帮忙:
df = pd.Panel(dd).transpose(2,0,1).to_frame().unstack()
print (df)
cantor godel russell
minor ping score ping score ping score
major
0 69.0 0.050641 51.0 0.765994 20.0 0.935196
1 91.0 0.398624 33.0 0.408681 75.0 0.464876
这也行。请注意,您的嵌套字典并非真正嵌套以便于翻译。
pd.concat({key:pd.DataFrame(dd[key]) for key in dd.keys()}).unstack()
Out[104]:
cantor godel russell
ping score ping score ping score
0 73.0 0.463084 94.0 0.954662 76.0 0.732291
1 28.0 0.778905 81.0 0.984285 36.0 0.094173
简而言之,使用 concat 创建多索引 df 非常容易。你只需要一个数据框字典
此问题与
dd = {0: {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}},
1: {"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}}}
或像这样的列表:
ll = [{"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}},
{"russell": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"cantor": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)},
"godel": {"score": numpy.random.rand(), "ping": numpy.random.randint(10, 100)}}]
我想构建一个 DataFrame
像:
russell godel cantor
score ping score ping score ping
0 0.17473916938994682 40 0.3443303845926545 47 0.43576522521017247 42
1 0.7341005512329682 22 0.14682222267827938 81 0.5662517436162526 59
我们可以看到列索引是 MultiIndex
。有没有办法做到这一点?如果我尝试 pandas.DataFrame.from_dict(dd, orient="index")
或 pandas.DataFrame(ll)
然后我得到:
russell godel cantor
0 {'score': 0.17473916938994682, 'ping': 40} {'score': 0.3443303845926545, 'ping': 47} {'score': 0.43576522521017247, 'ping': 42}
1 {'score': 0.7341005512329682, 'ping': 22} {'score': 0.14682222267827938, 'ping': 81} {'score': 0.5662517436162526, 'ping': 59}
这不是我想要的。
现在比较复杂,但是Panel
with transpose
, to_frame
and unstack
可以帮忙:
df = pd.Panel(dd).transpose(2,0,1).to_frame().unstack()
print (df)
cantor godel russell
minor ping score ping score ping score
major
0 69.0 0.050641 51.0 0.765994 20.0 0.935196
1 91.0 0.398624 33.0 0.408681 75.0 0.464876
这也行。请注意,您的嵌套字典并非真正嵌套以便于翻译。
pd.concat({key:pd.DataFrame(dd[key]) for key in dd.keys()}).unstack()
Out[104]:
cantor godel russell
ping score ping score ping score
0 73.0 0.463084 94.0 0.954662 76.0 0.732291
1 28.0 0.778905 81.0 0.984285 36.0 0.094173
简而言之,使用 concat 创建多索引 df 非常容易。你只需要一个数据框字典