按嵌套字典中的第二个位置对数据帧索引进行排序
sort dataframe index by the second position from nested dictionaries
我有这段代码,其中 lig_dec_residue 是来自大 .dat 文件的嵌套字典:
lig_dec_residue = {'f1': {}, 'f2': {}, 'f3': {}}
def plot_lig(res):
df = pd.DataFrame.from_dict(lig_dec_residue)
df.index = df.index.str.split(' ')
df.index = df.index.str[0] + ' ' + (df.index.str[1].astype(int) + int(res) - 1).astype(str)
df = df[df <= -0.25]
df.dropna(how='all', inplace=True)
df.plot(kind='bar', edgecolor='black')
plt.legend(['X var', 'Y var', 'Z var'])
plt.show()
plt.close()
这是结果:
f1 f2 f3
ARG 403 -0.265999 NaN -0.390653
LEU 455 -1.948253 -2.125521 -1.988445
PHE 456 -1.974429 -1.835651 -2.177540
ALA 475 -0.796856 -1.032929 -0.968554
GLY 476 -0.262736 -0.744952 -0.257448
ASN 477 NaN NaN -0.868419
PHE 486 -3.674621 -2.882512 -3.179725
ASN 487 -1.172256 -0.805725 -1.050299
LYS 493 -2.283489 NaN -5.231593
SER 496 NaN NaN -0.366986
PHE 497 NaN -0.340862 NaN
ARG 498 -1.485091 NaN -1.140743
THR 500 -1.497597 -0.778616 -1.961580
TYR 501 -4.286950 NaN -4.851700
GLY 502 -0.447453 -0.808606 -0.702321
VAL 503 -0.256496 -0.371461 -0.977062
HIS 505 -1.420959 NaN -1.321259
LYS 417 NaN -1.115154 NaN
GLN 493 NaN -2.625195 NaN
GLY 496 NaN -1.232041 NaN
GLN 498 NaN -2.271338 NaN
ASN 501 NaN -4.152646 NaN
TYR 505 NaN -2.469813 NaN
Pandas 将最后六个条目与其余条目区分开来(查看 TYR 501、ASN 501:它们应该很接近,但实际上没有!)。
想法是用条形图对 f1 f2 和 f3 进行比较。
这是我的输出:
有没有办法正确排序索引?我认为这个输出可能是由于词典排序方法。
我知道有 natsort 库,但我不能使用 if 因为数据帧来自嵌套字典。
我想根据指数的数量(例如,TYR 505 旁边的 HIS 505)对条进行分组,以便在适用的情况下进行直接比较.
谢谢!
卢多维科
将 sort_index
与自定义键一起使用:
df = df.sort_index(key=lambda x: x.str.split().str[1].str.zfill(5))
print(df)
# Output
f1 f2 f3
ARG 403 -0.265999 NaN -0.390653
LYS 417 NaN -1.115154 NaN
LEU 455 -1.948253 -2.125521 -1.988445
PHE 456 -1.974429 -1.835651 -2.177540
ALA 475 -0.796856 -1.032929 -0.968554
GLY 476 -0.262736 -0.744952 -0.257448
ASN 477 NaN NaN -0.868419
PHE 486 -3.674621 -2.882512 -3.179725
ASN 487 -1.172256 -0.805725 -1.050299
LYS 493 -2.283489 NaN -5.231593
GLN 493 NaN -2.625195 NaN
SER 496 NaN NaN -0.366986
GLY 496 NaN -1.232041 NaN
PHE 497 NaN -0.340862 NaN
GLN 498 NaN -2.271338 NaN
ARG 498 -1.485091 NaN -1.140743
THR 500 -1.497597 -0.778616 -1.961580
TYR 501 -4.286950 NaN -4.851700
ASN 501 NaN -4.152646 NaN
GLY 502 -0.447453 -0.808606 -0.702321
VAL 503 -0.256496 -0.371461 -0.977062
HIS 505 -1.420959 NaN -1.321259
TYR 505 NaN -2.469813 NaN
密钥详情:
>>> df.index.str.split().str[1].str.zfill(5)
Index(['00403', '00455', '00456', '00475', '00476', '00477', '00486', '00487',
'00493', '00496', '00497', '00498', '00500', '00501', '00502', '00503',
'00505', '00417', '00493', '00496', '00498', '00501', '00505'],
dtype='object')
注意:当两个数字的长度不相同时,用 0 填充可以让您进行自然排序:
>>> '23' > '5'
False
>>> '23' > '05'
True
我有这段代码,其中 lig_dec_residue 是来自大 .dat 文件的嵌套字典:
lig_dec_residue = {'f1': {}, 'f2': {}, 'f3': {}}
def plot_lig(res):
df = pd.DataFrame.from_dict(lig_dec_residue)
df.index = df.index.str.split(' ')
df.index = df.index.str[0] + ' ' + (df.index.str[1].astype(int) + int(res) - 1).astype(str)
df = df[df <= -0.25]
df.dropna(how='all', inplace=True)
df.plot(kind='bar', edgecolor='black')
plt.legend(['X var', 'Y var', 'Z var'])
plt.show()
plt.close()
这是结果:
f1 f2 f3
ARG 403 -0.265999 NaN -0.390653
LEU 455 -1.948253 -2.125521 -1.988445
PHE 456 -1.974429 -1.835651 -2.177540
ALA 475 -0.796856 -1.032929 -0.968554
GLY 476 -0.262736 -0.744952 -0.257448
ASN 477 NaN NaN -0.868419
PHE 486 -3.674621 -2.882512 -3.179725
ASN 487 -1.172256 -0.805725 -1.050299
LYS 493 -2.283489 NaN -5.231593
SER 496 NaN NaN -0.366986
PHE 497 NaN -0.340862 NaN
ARG 498 -1.485091 NaN -1.140743
THR 500 -1.497597 -0.778616 -1.961580
TYR 501 -4.286950 NaN -4.851700
GLY 502 -0.447453 -0.808606 -0.702321
VAL 503 -0.256496 -0.371461 -0.977062
HIS 505 -1.420959 NaN -1.321259
LYS 417 NaN -1.115154 NaN
GLN 493 NaN -2.625195 NaN
GLY 496 NaN -1.232041 NaN
GLN 498 NaN -2.271338 NaN
ASN 501 NaN -4.152646 NaN
TYR 505 NaN -2.469813 NaN
Pandas 将最后六个条目与其余条目区分开来(查看 TYR 501、ASN 501:它们应该很接近,但实际上没有!)。
想法是用条形图对 f1 f2 和 f3 进行比较。
这是我的输出:
有没有办法正确排序索引?我认为这个输出可能是由于词典排序方法。 我知道有 natsort 库,但我不能使用 if 因为数据帧来自嵌套字典。 我想根据指数的数量(例如,TYR 505 旁边的 HIS 505)对条进行分组,以便在适用的情况下进行直接比较.
谢谢!
卢多维科
将 sort_index
与自定义键一起使用:
df = df.sort_index(key=lambda x: x.str.split().str[1].str.zfill(5))
print(df)
# Output
f1 f2 f3
ARG 403 -0.265999 NaN -0.390653
LYS 417 NaN -1.115154 NaN
LEU 455 -1.948253 -2.125521 -1.988445
PHE 456 -1.974429 -1.835651 -2.177540
ALA 475 -0.796856 -1.032929 -0.968554
GLY 476 -0.262736 -0.744952 -0.257448
ASN 477 NaN NaN -0.868419
PHE 486 -3.674621 -2.882512 -3.179725
ASN 487 -1.172256 -0.805725 -1.050299
LYS 493 -2.283489 NaN -5.231593
GLN 493 NaN -2.625195 NaN
SER 496 NaN NaN -0.366986
GLY 496 NaN -1.232041 NaN
PHE 497 NaN -0.340862 NaN
GLN 498 NaN -2.271338 NaN
ARG 498 -1.485091 NaN -1.140743
THR 500 -1.497597 -0.778616 -1.961580
TYR 501 -4.286950 NaN -4.851700
ASN 501 NaN -4.152646 NaN
GLY 502 -0.447453 -0.808606 -0.702321
VAL 503 -0.256496 -0.371461 -0.977062
HIS 505 -1.420959 NaN -1.321259
TYR 505 NaN -2.469813 NaN
密钥详情:
>>> df.index.str.split().str[1].str.zfill(5)
Index(['00403', '00455', '00456', '00475', '00476', '00477', '00486', '00487',
'00493', '00496', '00497', '00498', '00500', '00501', '00502', '00503',
'00505', '00417', '00493', '00496', '00498', '00501', '00505'],
dtype='object')
注意:当两个数字的长度不相同时,用 0 填充可以让您进行自然排序:
>>> '23' > '5'
False
>>> '23' > '05'
True