在 Pandas python 中申请并分组

Apply and group in the Pandas python

我有一个像这样的 DataFrame:

df:

        cell            COMBINATION_ID     PREDICTION   SYNERGY_SCORE
0       BT-549            ADAM17.AKT    2.188390       7.398240
1      CAL-148            ADAM17.AKT   10.030628      12.686340
2        HCC38            ADAM17.AKT    9.217011      -4.351590
3      DU-4475           ADAM17.FGFR   -2.130943     -14.398730
4      HCC1187           ADAM17.FGFR   -1.103040      -6.400371
5        HCC70           ADAM17.FGFR   -2.076458     -14.909000
6     Hs-578-T           ADAM17.FGFR    3.831822      -7.859544

我想对 COMBINATION_ID 进行分组,并得到 PREDICTION 和 SYNERGY_SCORE

的相关性

结果会是这样的:

ADAM17.AKT   cor([2.188390,10.030628,9.217011],[7.398240,12.686340,-4.351590] 
ADAM17.FGFR  cor([-2.130943,-1.103040, -2.076458 ,3.831822],[-14.398730,-6.400371,-14.909000,-7.859544]

我可以使用:

df2 = df.groupby('COMBINATION_ID').apply(f)

但是我不知道怎么定义def f():

谢谢

考虑将 pandas' corr() 与定义的函数一起使用,假设您安装了 scipy 包 pandas。您可以指定以下方法:pearson(默认)、kendallspearman:

def f(row):    
    row['CORRELATION'] = row['PREDICTION'].corr(row['SYNERGY_SCORE'], method='spearman')
    return row

df2 = df.groupby('COMBINATION_ID').apply(f)

您可以用实际数字查看上面的新列:

from scipy.stats.stats import spearmanr    

# ADAM17.AKT 
print(spearmanr([2.188390,10.030628,9.217011],
                [7.398240,12.686340,-4.351590]))
# ADAM17.FGFR 
print(spearmanr([-2.130943,-1.103040, -2.076458 ,3.831822],
                [-14.398730,-6.400371,-14.909000,-7.859544]))