设置行索引并查询具有 multi-index 列的 pandas 数据框
Setting a row index on and querying a pandas dataframe with multi-index columns
从具有如下所示的 multi-dimensional 列标题结构的 pandas
数据框开始,有没有一种方法可以转换 Area Names
和 Area Codes
标题,以便它们跨越每个级别(即跨越多个列标题行的单个 Area Names
和 Area Codes
标签?
如果是这样,那么我如何才能 运行 列上的查询仅 return 对应于特定值的行(例如 区号 的 E06000047),或 低 和 非常高 的值 英格兰 在 2012/13?
我想知道根据地区代码或地区名称或两列来定义行索引是否更容易行索引 ['*Area Code*', '*Area Names*']
。如果是这样,我如何从当前的 table 执行此操作? set_index
似乎对使用当前结构犹豫不决?
创建以上内容的代码片段:
import pandas as pd
df= pd.DataFrame({('2011/12*', 'High', '7-8'): {3: 49.83,
5: 50.01,
7: 48.09,
8: 43.58,
9: 44.19},
('2011/12*', 'Low', '0-4'): {3: 6.51, 5: 6.53, 7: 6.49, 8: 6.41, 9: 6.12},
('2011/12*', 'Medium', '5-6'): {3: 17.44,
5: 17.59,
7: 18.11,
8: 19.23,
9: 20.01},
('2011/12*', 'Very High', '9-10'): {3: 26.22,
5: 25.87,
7: 27.32,
8: 30.78,
9: 29.68},
('2012/13*', 'High', '7-8'): {3: 51.16,
5: 51.35,
7: 48.47,
8: 44.67,
9: 49.39},
('2012/13*', 'Low', '0-4'): {3: 5.71, 5: 5.74, 7: 6.73, 8: 8.42, 9: 6.51},
('2012/13*', 'Medium', '5-6'): {3: 17.1,
5: 17.29,
7: 18.46,
8: 20.23,
9: 15.81},
('2012/13*', 'Very High', '9-10'): {3: 26.03,
5: 25.62,
7: 26.34,
8: 26.68,
9: 28.3},
('Area Codes', 'Area Codes', 'Area Codes'): {3: 'K02000001',
5: 'E92000001',
7: 'E12000001',
8: 'E06000047',
9: 'E06000005'},
('Area Names', 'Area Names', 'Area Names'): {3: 'UNITED KINGDOM',
5: 'ENGLAND',
7: 'NORTH EAST',
8: 'County Durham',
9: 'Darlington'}})
我认为您需要 set_index
和由 MultiIndex
设置的元组:
df.set_index([('Area Codes','Area Codes','Area Codes'),
('Area Names','Area Names','Area Names')], inplace=True)
df.index.names = ['Area Codes','Area Names']
print (df)
2011/12* 2012/13* \
High Low Medium Very High High Low
7-8 0-4 5-6 9-10 7-8 0-4
Area Codes Area Names
K02000001 UNITED KINGDOM 49.83 6.51 17.44 26.22 51.16 5.71
E92000001 ENGLAND 50.01 6.53 17.59 25.87 51.35 5.74
E12000001 NORTH EAST 48.09 6.49 18.11 27.32 48.47 6.73
E06000047 County Durham 43.58 6.41 19.23 30.78 44.67 8.42
E06000005 Darlington 44.19 6.12 20.01 29.68 49.39 6.51
Medium Very High
5-6 9-10
Area Codes Area Names
K02000001 UNITED KINGDOM 17.10 26.03
E92000001 ENGLAND 17.29 25.62
E12000001 NORTH EAST 18.46 26.34
E06000047 County Durham 20.23 26.68
E06000005 Darlington 15.81 28.30
那么需要sort_index
,因为:
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'
df.sort_index(inplace=True)
上次使用选择 slicers:
idx = pd.IndexSlice
print (df.loc[idx['E06000047',:], :])
2011/12* 2012/13* \
High Low Medium Very High High Low
7-8 0-4 5-6 9-10 7-8 0-4
Area Codes Area Names
E06000047 County Durham 43.58 6.41 19.23 30.78 44.67 8.42
Medium Very High
5-6 9-10
Area Codes Area Names
E06000047 County Durham 20.23 26.68
print (df.loc[idx[:,'ENGLAND'], idx['2012/13*',['Low','Very High']]])
2012/13*
Low Very High
0-4 9-10
Area Codes Area Names
E92000001 ENGLAND 5.74 25.62
从具有如下所示的 multi-dimensional 列标题结构的 pandas
数据框开始,有没有一种方法可以转换 Area Names
和 Area Codes
标题,以便它们跨越每个级别(即跨越多个列标题行的单个 Area Names
和 Area Codes
标签?
如果是这样,那么我如何才能 运行 列上的查询仅 return 对应于特定值的行(例如 区号 的 E06000047),或 低 和 非常高 的值 英格兰 在 2012/13?
我想知道根据地区代码或地区名称或两列来定义行索引是否更容易行索引 ['*Area Code*', '*Area Names*']
。如果是这样,我如何从当前的 table 执行此操作? set_index
似乎对使用当前结构犹豫不决?
创建以上内容的代码片段:
import pandas as pd
df= pd.DataFrame({('2011/12*', 'High', '7-8'): {3: 49.83,
5: 50.01,
7: 48.09,
8: 43.58,
9: 44.19},
('2011/12*', 'Low', '0-4'): {3: 6.51, 5: 6.53, 7: 6.49, 8: 6.41, 9: 6.12},
('2011/12*', 'Medium', '5-6'): {3: 17.44,
5: 17.59,
7: 18.11,
8: 19.23,
9: 20.01},
('2011/12*', 'Very High', '9-10'): {3: 26.22,
5: 25.87,
7: 27.32,
8: 30.78,
9: 29.68},
('2012/13*', 'High', '7-8'): {3: 51.16,
5: 51.35,
7: 48.47,
8: 44.67,
9: 49.39},
('2012/13*', 'Low', '0-4'): {3: 5.71, 5: 5.74, 7: 6.73, 8: 8.42, 9: 6.51},
('2012/13*', 'Medium', '5-6'): {3: 17.1,
5: 17.29,
7: 18.46,
8: 20.23,
9: 15.81},
('2012/13*', 'Very High', '9-10'): {3: 26.03,
5: 25.62,
7: 26.34,
8: 26.68,
9: 28.3},
('Area Codes', 'Area Codes', 'Area Codes'): {3: 'K02000001',
5: 'E92000001',
7: 'E12000001',
8: 'E06000047',
9: 'E06000005'},
('Area Names', 'Area Names', 'Area Names'): {3: 'UNITED KINGDOM',
5: 'ENGLAND',
7: 'NORTH EAST',
8: 'County Durham',
9: 'Darlington'}})
我认为您需要 set_index
和由 MultiIndex
设置的元组:
df.set_index([('Area Codes','Area Codes','Area Codes'),
('Area Names','Area Names','Area Names')], inplace=True)
df.index.names = ['Area Codes','Area Names']
print (df)
2011/12* 2012/13* \
High Low Medium Very High High Low
7-8 0-4 5-6 9-10 7-8 0-4
Area Codes Area Names
K02000001 UNITED KINGDOM 49.83 6.51 17.44 26.22 51.16 5.71
E92000001 ENGLAND 50.01 6.53 17.59 25.87 51.35 5.74
E12000001 NORTH EAST 48.09 6.49 18.11 27.32 48.47 6.73
E06000047 County Durham 43.58 6.41 19.23 30.78 44.67 8.42
E06000005 Darlington 44.19 6.12 20.01 29.68 49.39 6.51
Medium Very High
5-6 9-10
Area Codes Area Names
K02000001 UNITED KINGDOM 17.10 26.03
E92000001 ENGLAND 17.29 25.62
E12000001 NORTH EAST 18.46 26.34
E06000047 County Durham 20.23 26.68
E06000005 Darlington 15.81 28.30
那么需要sort_index
,因为:
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'
df.sort_index(inplace=True)
上次使用选择 slicers:
idx = pd.IndexSlice
print (df.loc[idx['E06000047',:], :])
2011/12* 2012/13* \
High Low Medium Very High High Low
7-8 0-4 5-6 9-10 7-8 0-4
Area Codes Area Names
E06000047 County Durham 43.58 6.41 19.23 30.78 44.67 8.42
Medium Very High
5-6 9-10
Area Codes Area Names
E06000047 County Durham 20.23 26.68
print (df.loc[idx[:,'ENGLAND'], idx['2012/13*',['Low','Very High']]])
2012/13*
Low Very High
0-4 9-10
Area Codes Area Names
E92000001 ENGLAND 5.74 25.62