如何制作散点图显示单个基因与多个基因之间的相关性?
How to make a scatterplot showing correlation between a single gene vs multiple genes?
我有一个矩阵,其中样本作为行,基因作为列,具有基因表达值 (RPKM)。
以下是示例数据。原始数据有800多个样本。
LINP1 EGFR RB1 TP53 CDKN2A MYC
Sample1 0.02 0.038798682 0.1423662 2.778587067 0.471403939 18.93687655
Sample2 0 0.059227225 0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0 0.052116384 0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06 0.199264618 0.261100548 2.516963635 0.63659138 11.01441624
Sample5 0 0.123521916 0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0 0.128767634 0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0 0.080097356 0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0 0.017421323 0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0 2.10281137 0.401582013 8.202902242 0.140596724 60.25989178
为了制作显示两个基因之间相关性的散点图,我使用了 ggscatter
ggscatter(A2, x = "LINP1", y = "EGFR",
add = "reg.line", conf.int = FALSE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "LINP1", ylab = "EGFR", xscale="log2", yscale="log2")
散点图看起来像这样
我想画这样的散点图
图 2g 在此 Research paper。其中 LINP1 表达在单个图中针对所有其他基因显示。任何代码都可以吗?
当您进行皮尔逊相关时,结果与您绘制散点图并绘制回归模型的拟合线相同,这可以在 ggplot2::geom_smooth()
中结合您的基因散点图来完成。
编辑:
根据 OP 的评论更新为在两个尺度上使用 log2() 转换。请注意,在进行转换时,您有时会得到无效值。您的数据有 0
s,所以 log2() 转换 returns -Inf
:
library(tidyr)
library(ggplot2)
df <- read.table(text = "
LINP1 EGFR RB1 TP53 CDKN2A MYC
Sample1 0.02 0.038798682 0.1423662 2.778587067 0.471403939 18.93687655
Sample2 0 0.059227225 0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0 0.052116384 0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06 0.199264618 0.261100548 2.516963635 0.63659138 11.01441624
Sample5 0 0.123521916 0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0 0.128767634 0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0 0.080097356 0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0 0.017421323 0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0 2.10281137 0.401582013 8.202902242 0.140596724 60.25989178", header = TRUE)
df %>%
gather(key = variable, value = values, EGFR:MYC) %>%
ggplot(aes(LINP1, values)) +
geom_point() +
facet_grid(. ~ variable, scales = "free_x") +
geom_smooth(method = "lm", se = FALSE) +
scale_y_continuous(trans = "log2") +
scale_x_continuous(trans = "log2")
我有一个矩阵,其中样本作为行,基因作为列,具有基因表达值 (RPKM)。
以下是示例数据。原始数据有800多个样本。
LINP1 EGFR RB1 TP53 CDKN2A MYC
Sample1 0.02 0.038798682 0.1423662 2.778587067 0.471403939 18.93687655
Sample2 0 0.059227225 0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0 0.052116384 0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06 0.199264618 0.261100548 2.516963635 0.63659138 11.01441624
Sample5 0 0.123521916 0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0 0.128767634 0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0 0.080097356 0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0 0.017421323 0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0 2.10281137 0.401582013 8.202902242 0.140596724 60.25989178
为了制作显示两个基因之间相关性的散点图,我使用了 ggscatter
ggscatter(A2, x = "LINP1", y = "EGFR",
add = "reg.line", conf.int = FALSE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "LINP1", ylab = "EGFR", xscale="log2", yscale="log2")
散点图看起来像这样
我想画这样的散点图
图 2g 在此 Research paper。其中 LINP1 表达在单个图中针对所有其他基因显示。任何代码都可以吗?
当您进行皮尔逊相关时,结果与您绘制散点图并绘制回归模型的拟合线相同,这可以在 ggplot2::geom_smooth()
中结合您的基因散点图来完成。
编辑:
根据 OP 的评论更新为在两个尺度上使用 log2() 转换。请注意,在进行转换时,您有时会得到无效值。您的数据有 0
s,所以 log2() 转换 returns -Inf
:
library(tidyr)
library(ggplot2)
df <- read.table(text = "
LINP1 EGFR RB1 TP53 CDKN2A MYC
Sample1 0.02 0.038798682 0.1423662 2.778587067 0.471403939 18.93687655
Sample2 0 0.059227225 0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0 0.052116384 0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06 0.199264618 0.261100548 2.516963635 0.63659138 11.01441624
Sample5 0 0.123521916 0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0 0.128767634 0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0 0.080097356 0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0 0.017421323 0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0 2.10281137 0.401582013 8.202902242 0.140596724 60.25989178", header = TRUE)
df %>%
gather(key = variable, value = values, EGFR:MYC) %>%
ggplot(aes(LINP1, values)) +
geom_point() +
facet_grid(. ~ variable, scales = "free_x") +
geom_smooth(method = "lm", se = FALSE) +
scale_y_continuous(trans = "log2") +
scale_x_continuous(trans = "log2")