如何制作散点图显示单个基因与多个基因之间的相关性?

How to make a scatterplot showing correlation between a single gene vs multiple genes?

我有一个矩阵,其中样本作为行,基因作为列,具有基因表达值 (RPKM)。

以下是示例数据。原始数据有800多个样本。

        LINP1   EGFR            RB1       TP53         CDKN2A      MYC
Sample1 0.02   0.038798682  0.1423662   2.778587067 0.471403939 18.93687655
Sample2 0      0.059227225  0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0      0.052116384  0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06   0.199264618  0.261100548 2.516963635 0.63659138  11.01441624
Sample5 0      0.123521916  0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0      0.128767634  0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0      0.080097356  0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0      0.017421323  0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0      2.10281137   0.401582013 8.202902242 0.140596724 60.25989178

为了制作显示两个基因之间相关性的散点图,我使用了 ggscatter

ggscatter(A2, x = "LINP1", y = "EGFR", 
          add = "reg.line", conf.int = FALSE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "LINP1", ylab = "EGFR", xscale="log2", yscale="log2")

散点图看起来像这样

我想画这样的散点图

图 2g 在此 Research paper。其中 LINP1 表达在单个图中针对所有其他基因显示。任何代码都可以吗?

当您进行皮尔逊相关时,结果与您绘制散点图并绘制回归模型的拟合线相同,这可以在 ggplot2::geom_smooth() 中结合您的基因散点图来完成。

编辑: 根据 OP 的评论更新为在两个尺度上使用 log2() 转换。请注意,在进行转换时,您有时会得到无效值。您的数据有 0s,所以 log2() 转换 returns -Inf:

library(tidyr)
library(ggplot2)

df <- read.table(text = "
LINP1   EGFR            RB1       TP53         CDKN2A      MYC
Sample1 0.02   0.038798682  0.1423662   2.778587067 0.471403939 18.93687655
Sample2 0      0.059227225  0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0      0.052116384  0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06   0.199264618  0.261100548 2.516963635 0.63659138  11.01441624
Sample5 0      0.123521916  0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0      0.128767634  0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0      0.080097356  0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0      0.017421323  0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0      2.10281137   0.401582013 8.202902242 0.140596724 60.25989178", header = TRUE)



df %>% 
  gather(key = variable, value = values, EGFR:MYC) %>% 
  ggplot(aes(LINP1, values)) + 
  geom_point() + 
  facet_grid(. ~ variable, scales = "free_x") + 
  geom_smooth(method = "lm", se = FALSE) + 
  scale_y_continuous(trans = "log2") + 
  scale_x_continuous(trans = "log2")