Python 多索引:使用 2 级索引、DataFrame 查找坐标
Python Multi-Index: Finding cordinates with level 2 index, DataFrame
我有一个带有多索引索引和列的空 DataFrame。我还有一个字符串列表,它是二级索引的坐标。由于我所有的二级索引都是唯一的,我希望用我的字符串列表找到坐标和输入值。看看下面的例子
df=
DNA Cat2 ....
Item A B C D E F F H I J
DNA Item
Cat2 A 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0
....
str_cord = [(A,B),(A,H),(A,I),(B,H),(B,I),(H,I)]
#and my output should be like below.
df_result=
DNA Cat2 ....
Item A B C D E F F H I J
DNA Item
Cat2 A 0 1 0 0 0 0 0 1 1 0
B 0 0 0 0 0 0 0 1 1 0
C 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 1 0
....
它看起来有点复杂,但我只想使用我的 str_cord[0] 作为 df_result 的坐标。我尝试使用 .loc,但似乎我需要输入 1 级索引。我正在寻找不必输入多索引 level1 并使用 level2 字符串查找坐标的方法。希望它有意义并提前致谢! (呵呵,数据本身就很大,尽量高效)
您可以使用:
for i, j in str_cord:
idx = pd.IndexSlice
df.loc[idx[:, i], idx[:, j]] = 1
样本:
L = list('ABCDEFGHIJ')
mux = pd.MultiIndex.from_product([['Cat1','Cat2'], L])
df = pd.DataFrame(0, index=mux, columns=mux)
print (df)
Cat1 Cat2
A B C D E F G H I J A B C D E F G H I J
Cat1 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cat2 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
str_cord = [('A','B'),('A','H'),('A','I'),('B','H'),('B','I'),('H','I')]
for i, j in str_cord:
idx = pd.IndexSlice
df.loc[idx[:, i], idx[:, j]] = 1
print (df)
Cat1 Cat2
A B C D E F G H I J A B C D E F G H I J
Cat1 A 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0
B 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cat2 A 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0
B 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
我有一个带有多索引索引和列的空 DataFrame。我还有一个字符串列表,它是二级索引的坐标。由于我所有的二级索引都是唯一的,我希望用我的字符串列表找到坐标和输入值。看看下面的例子
df=
DNA Cat2 ....
Item A B C D E F F H I J
DNA Item
Cat2 A 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0
....
str_cord = [(A,B),(A,H),(A,I),(B,H),(B,I),(H,I)]
#and my output should be like below.
df_result=
DNA Cat2 ....
Item A B C D E F F H I J
DNA Item
Cat2 A 0 1 0 0 0 0 0 1 1 0
B 0 0 0 0 0 0 0 1 1 0
C 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 1 0
....
它看起来有点复杂,但我只想使用我的 str_cord[0] 作为 df_result 的坐标。我尝试使用 .loc,但似乎我需要输入 1 级索引。我正在寻找不必输入多索引 level1 并使用 level2 字符串查找坐标的方法。希望它有意义并提前致谢! (呵呵,数据本身就很大,尽量高效)
您可以使用:
for i, j in str_cord:
idx = pd.IndexSlice
df.loc[idx[:, i], idx[:, j]] = 1
样本:
L = list('ABCDEFGHIJ')
mux = pd.MultiIndex.from_product([['Cat1','Cat2'], L])
df = pd.DataFrame(0, index=mux, columns=mux)
print (df)
Cat1 Cat2
A B C D E F G H I J A B C D E F G H I J
Cat1 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cat2 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
str_cord = [('A','B'),('A','H'),('A','I'),('B','H'),('B','I'),('H','I')]
for i, j in str_cord:
idx = pd.IndexSlice
df.loc[idx[:, i], idx[:, j]] = 1
print (df)
Cat1 Cat2
A B C D E F G H I J A B C D E F G H I J
Cat1 A 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0
B 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cat2 A 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0
B 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0