在 Pandas python 中申请并分组
Apply and group in the Pandas python
我有一个像这样的 DataFrame:
df:
cell COMBINATION_ID PREDICTION SYNERGY_SCORE
0 BT-549 ADAM17.AKT 2.188390 7.398240
1 CAL-148 ADAM17.AKT 10.030628 12.686340
2 HCC38 ADAM17.AKT 9.217011 -4.351590
3 DU-4475 ADAM17.FGFR -2.130943 -14.398730
4 HCC1187 ADAM17.FGFR -1.103040 -6.400371
5 HCC70 ADAM17.FGFR -2.076458 -14.909000
6 Hs-578-T ADAM17.FGFR 3.831822 -7.859544
我想对 COMBINATION_ID 进行分组,并得到 PREDICTION 和 SYNERGY_SCORE
的相关性
结果会是这样的:
ADAM17.AKT cor([2.188390,10.030628,9.217011],[7.398240,12.686340,-4.351590]
ADAM17.FGFR cor([-2.130943,-1.103040, -2.076458 ,3.831822],[-14.398730,-6.400371,-14.909000,-7.859544]
我可以使用:
df2 = df.groupby('COMBINATION_ID').apply(f)
但是我不知道怎么定义def f()
:
谢谢
考虑将 pandas' corr() 与定义的函数一起使用,假设您安装了 scipy
包 pandas。您可以指定以下方法:pearson(默认)、kendall 和 spearman:
def f(row):
row['CORRELATION'] = row['PREDICTION'].corr(row['SYNERGY_SCORE'], method='spearman')
return row
df2 = df.groupby('COMBINATION_ID').apply(f)
您可以用实际数字查看上面的新列:
from scipy.stats.stats import spearmanr
# ADAM17.AKT
print(spearmanr([2.188390,10.030628,9.217011],
[7.398240,12.686340,-4.351590]))
# ADAM17.FGFR
print(spearmanr([-2.130943,-1.103040, -2.076458 ,3.831822],
[-14.398730,-6.400371,-14.909000,-7.859544]))
我有一个像这样的 DataFrame:
df:
cell COMBINATION_ID PREDICTION SYNERGY_SCORE
0 BT-549 ADAM17.AKT 2.188390 7.398240
1 CAL-148 ADAM17.AKT 10.030628 12.686340
2 HCC38 ADAM17.AKT 9.217011 -4.351590
3 DU-4475 ADAM17.FGFR -2.130943 -14.398730
4 HCC1187 ADAM17.FGFR -1.103040 -6.400371
5 HCC70 ADAM17.FGFR -2.076458 -14.909000
6 Hs-578-T ADAM17.FGFR 3.831822 -7.859544
我想对 COMBINATION_ID 进行分组,并得到 PREDICTION 和 SYNERGY_SCORE
的相关性结果会是这样的:
ADAM17.AKT cor([2.188390,10.030628,9.217011],[7.398240,12.686340,-4.351590]
ADAM17.FGFR cor([-2.130943,-1.103040, -2.076458 ,3.831822],[-14.398730,-6.400371,-14.909000,-7.859544]
我可以使用:
df2 = df.groupby('COMBINATION_ID').apply(f)
但是我不知道怎么定义def f()
:
谢谢
考虑将 pandas' corr() 与定义的函数一起使用,假设您安装了 scipy
包 pandas。您可以指定以下方法:pearson(默认)、kendall 和 spearman:
def f(row):
row['CORRELATION'] = row['PREDICTION'].corr(row['SYNERGY_SCORE'], method='spearman')
return row
df2 = df.groupby('COMBINATION_ID').apply(f)
您可以用实际数字查看上面的新列:
from scipy.stats.stats import spearmanr
# ADAM17.AKT
print(spearmanr([2.188390,10.030628,9.217011],
[7.398240,12.686340,-4.351590]))
# ADAM17.FGFR
print(spearmanr([-2.130943,-1.103040, -2.076458 ,3.831822],
[-14.398730,-6.400371,-14.909000,-7.859544]))