有没有办法使用 ggplot2 从 tibble 生成散点图,所有值都在没有 pivot_wider 的单列中?
Is there a way to produce a scatterplot using ggplot2 from a tibble with all the values in a single columns without pivot_wider?
我有一个包含三列的数据框:
sampleData <- structure(list(sgRNA = c("SFPQ_9", "SFPQ_9", "FBXO18_13", "FBXO18_13",
"DDX21_55", "DDX21_55", "TAF6L_11", "TAF6L_11", "NAA40_3", "NAA40_3",
"KDM5A_1", "KDM5A_1", "DGKH_17", "DGKH_17", "NAA30_15", "NAA30_15",
"HMG20A_8", "HMG20A_8", "CASKIN1_35", "CASKIN1_35", "NUBP1_20",
"NUBP1_20", "CTCF_9", "CTCF_9", "THAP11_17", "THAP11_17", "EZH1_9",
"EZH1_9", "SMARCD2_21", "SMARCD2_21", "E2F6_6", "E2F6_6", "CENPA_11",
"CENPA_11", "SP140_35", "SP140_35", "SETD4_3", "SETD4_3", "STAG3_9",
"STAG3_9", "RAD54B_39", "RAD54B_39", "SMC1A_59", "SMC1A_59",
"ZNF257_1246", "ZNF257_1246", "DYNC1I2_4", "DYNC1I2_4", "NTC_77",
"NTC_77"), replicate = c("R1", "R2", "R1", "R2", "R1", "R2",
"R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1",
"R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2",
"R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1",
"R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2"
), abundance = c(450L, 583L, 209L, 231L, 212L, 288L, 958L, 1103L,
562L, 717L, 388L, 452L, 290L, 330L, 201L, 281L, 142L, 258L, 608L,
850L, 218L, 328L, 522L, 711L, 623L, 772L, 371L, 471L, 56L, 52L,
160L, 135L, 359L, 416L, 213L, 348L, 301L, 416L, 185L, 256L, 222L,
238L, 347L, 536L, 765L, 973L, 115L, 117L, 102L, 132L)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
我想建立一个散点图。所有值都在 "abundance" 列中,"replicate" 指定观测值是否用作沿 x 轴或 y 轴的坐标。而 "sgRNA" 指定点。我知道我可以将数据旋转为更宽的格式以生成两个新列 "R1" 和 "R2" 并使用 ggplot2 将它们相互绘制,但是有没有办法在不旋转的情况下做到这一点?
我同意@www 的评论。不清楚为什么 您不想转向。
回答您的问题:不,您需要以一种或另一种方式重塑数据。
如果您不喜欢 pivot_wider
,您可以通过以下方式将 xtabs
与 as.data.frame.matrix
结合使用:
ggplot(as.data.frame.matrix(xtabs(abundance ~ ., data = sampleData)), aes(R1, R2)) +
geom_point()
但这还是从长到宽的reshape
我想在一些罕见的情况下,比如你使用的是公司的电脑,你无法访问tidyr
,那么下面的方法可能会起作用,它只使用dplyr
和ggplot2
.
library(dplyr)
library(ggplot2)
R1 <- sampleData %>% filter(replicate %in% "R1") %>% select(-replicate)
R2 <- sampleData %>% filter(replicate %in% "R2") %>% select(-replicate)
R1R2 <- R1 %>% left_join(R2, by = "sgRNA", suffix = c("_R1", "_R2"))
ggplot(R1R2, aes(x = abundance_R1, y = abundance_R2)) +
geom_point()
有时我不认为这是关于编写额外的代码,而更多的是关于可重现性和理解所做的事情,为了你自己一段时间的道路,或者为了其他人试图理解你做了什么。
下面这个有效,但不是最直观的:
ggplot(data.frame(split(sampleData$abundance,sampleData$replicate)),
aes(x=R1,y=R2)) + geom_point()
这些不使用 tidyr 但使用其他方法将数据转换为宽格式:
1) 将 sampleData
读入在第二列拆分的动物园对象,将其转换为数据框(它将包含列 R1
和R2
) 并使用来自 ggplot2 的 qplot
:
library(ggplot2)
library(magrittr)
library(zoo)
sampleData %>%
read.zoo(split = 2, FUN = c) %>%
as.data.frame %$%
qplot(R1, R2)
2) 给出相同结果的另一种方法是使用 tapply
:
library(ggplot2)
library(magrittr)
sampleData %$%
tapply(.[[3]], .[-3], c) %>%
as.data.frame.matrix %$%
qplot(R1, R2)
这也可以在没有 magrittr 的情况下写成这样:
library(ggplot2)
with(as.data.frame.matrix(tapply(sampleData[[3]], sampleData[-3], c)),
qplot(R1, R2))
我有一个包含三列的数据框:
sampleData <- structure(list(sgRNA = c("SFPQ_9", "SFPQ_9", "FBXO18_13", "FBXO18_13",
"DDX21_55", "DDX21_55", "TAF6L_11", "TAF6L_11", "NAA40_3", "NAA40_3",
"KDM5A_1", "KDM5A_1", "DGKH_17", "DGKH_17", "NAA30_15", "NAA30_15",
"HMG20A_8", "HMG20A_8", "CASKIN1_35", "CASKIN1_35", "NUBP1_20",
"NUBP1_20", "CTCF_9", "CTCF_9", "THAP11_17", "THAP11_17", "EZH1_9",
"EZH1_9", "SMARCD2_21", "SMARCD2_21", "E2F6_6", "E2F6_6", "CENPA_11",
"CENPA_11", "SP140_35", "SP140_35", "SETD4_3", "SETD4_3", "STAG3_9",
"STAG3_9", "RAD54B_39", "RAD54B_39", "SMC1A_59", "SMC1A_59",
"ZNF257_1246", "ZNF257_1246", "DYNC1I2_4", "DYNC1I2_4", "NTC_77",
"NTC_77"), replicate = c("R1", "R2", "R1", "R2", "R1", "R2",
"R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1",
"R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2",
"R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1",
"R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2"
), abundance = c(450L, 583L, 209L, 231L, 212L, 288L, 958L, 1103L,
562L, 717L, 388L, 452L, 290L, 330L, 201L, 281L, 142L, 258L, 608L,
850L, 218L, 328L, 522L, 711L, 623L, 772L, 371L, 471L, 56L, 52L,
160L, 135L, 359L, 416L, 213L, 348L, 301L, 416L, 185L, 256L, 222L,
238L, 347L, 536L, 765L, 973L, 115L, 117L, 102L, 132L)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
我想建立一个散点图。所有值都在 "abundance" 列中,"replicate" 指定观测值是否用作沿 x 轴或 y 轴的坐标。而 "sgRNA" 指定点。我知道我可以将数据旋转为更宽的格式以生成两个新列 "R1" 和 "R2" 并使用 ggplot2 将它们相互绘制,但是有没有办法在不旋转的情况下做到这一点?
我同意@www 的评论。不清楚为什么 您不想转向。
回答您的问题:不,您需要以一种或另一种方式重塑数据。
如果您不喜欢 pivot_wider
,您可以通过以下方式将 xtabs
与 as.data.frame.matrix
结合使用:
ggplot(as.data.frame.matrix(xtabs(abundance ~ ., data = sampleData)), aes(R1, R2)) +
geom_point()
但这还是从长到宽的reshape
我想在一些罕见的情况下,比如你使用的是公司的电脑,你无法访问tidyr
,那么下面的方法可能会起作用,它只使用dplyr
和ggplot2
.
library(dplyr)
library(ggplot2)
R1 <- sampleData %>% filter(replicate %in% "R1") %>% select(-replicate)
R2 <- sampleData %>% filter(replicate %in% "R2") %>% select(-replicate)
R1R2 <- R1 %>% left_join(R2, by = "sgRNA", suffix = c("_R1", "_R2"))
ggplot(R1R2, aes(x = abundance_R1, y = abundance_R2)) +
geom_point()
有时我不认为这是关于编写额外的代码,而更多的是关于可重现性和理解所做的事情,为了你自己一段时间的道路,或者为了其他人试图理解你做了什么。
下面这个有效,但不是最直观的:
ggplot(data.frame(split(sampleData$abundance,sampleData$replicate)),
aes(x=R1,y=R2)) + geom_point()
这些不使用 tidyr 但使用其他方法将数据转换为宽格式:
1) 将 sampleData
读入在第二列拆分的动物园对象,将其转换为数据框(它将包含列 R1
和R2
) 并使用来自 ggplot2 的 qplot
:
library(ggplot2)
library(magrittr)
library(zoo)
sampleData %>%
read.zoo(split = 2, FUN = c) %>%
as.data.frame %$%
qplot(R1, R2)
2) 给出相同结果的另一种方法是使用 tapply
:
library(ggplot2)
library(magrittr)
sampleData %$%
tapply(.[[3]], .[-3], c) %>%
as.data.frame.matrix %$%
qplot(R1, R2)
这也可以在没有 magrittr 的情况下写成这样:
library(ggplot2)
with(as.data.frame.matrix(tapply(sampleData[[3]], sampleData[-3], c)),
qplot(R1, R2))