有没有办法使用 ggplot2 从 tibble 生成散点图,所有值都在没有 pivot_wider 的单列中?

Is there a way to produce a scatterplot using ggplot2 from a tibble with all the values in a single columns without pivot_wider?

我有一个包含三列的数据框:

sampleData <- structure(list(sgRNA = c("SFPQ_9", "SFPQ_9", "FBXO18_13", "FBXO18_13", 
                         "DDX21_55", "DDX21_55", "TAF6L_11", "TAF6L_11", "NAA40_3", "NAA40_3", 
                         "KDM5A_1", "KDM5A_1", "DGKH_17", "DGKH_17", "NAA30_15", "NAA30_15", 
                         "HMG20A_8", "HMG20A_8", "CASKIN1_35", "CASKIN1_35", "NUBP1_20", 
                         "NUBP1_20", "CTCF_9", "CTCF_9", "THAP11_17", "THAP11_17", "EZH1_9", 
                         "EZH1_9", "SMARCD2_21", "SMARCD2_21", "E2F6_6", "E2F6_6", "CENPA_11", 
                         "CENPA_11", "SP140_35", "SP140_35", "SETD4_3", "SETD4_3", "STAG3_9", 
                         "STAG3_9", "RAD54B_39", "RAD54B_39", "SMC1A_59", "SMC1A_59", 
                         "ZNF257_1246", "ZNF257_1246", "DYNC1I2_4", "DYNC1I2_4", "NTC_77", 
                         "NTC_77"), replicate = c("R1", "R2", "R1", "R2", "R1", "R2", 
                                                  "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", 
                                                  "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", 
                                                  "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", 
                                                  "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2", "R1", "R2"
                         ), abundance = c(450L, 583L, 209L, 231L, 212L, 288L, 958L, 1103L, 
                                          562L, 717L, 388L, 452L, 290L, 330L, 201L, 281L, 142L, 258L, 608L, 
                                          850L, 218L, 328L, 522L, 711L, 623L, 772L, 371L, 471L, 56L, 52L, 
                                          160L, 135L, 359L, 416L, 213L, 348L, 301L, 416L, 185L, 256L, 222L, 
                                          238L, 347L, 536L, 765L, 973L, 115L, 117L, 102L, 132L)), row.names = c(NA, 
                                                                                                                -50L), class = c("tbl_df", "tbl", "data.frame"))

我想建立一个散点图。所有值都在 "abundance" 列中,"replicate" 指定观测值是否用作沿 x 轴或 y 轴的坐标。而 "sgRNA" 指定点。我知道我可以将数据旋转为更宽的格式以生成两个新列 "R1" 和 "R2" 并使用 ggplot2 将它们相互绘制,但是有没有办法在不旋转的情况下做到这一点?

我同意@www 的评论。不清楚为什么 您不想转向。

回答您的问题:不,您需要以一种或另一种方式重塑数据。

如果您不喜欢 pivot_wider,您可以通过以下方式将 xtabsas.data.frame.matrix 结合使用:

ggplot(as.data.frame.matrix(xtabs(abundance ~ ., data = sampleData)), aes(R1, R2)) + 
    geom_point()

但这还是从长到宽的reshape

我想在一些罕见的情况下,比如你使用的是公司的电脑,你无法访问tidyr,那么下面的方法可能会起作用,它只使用dplyrggplot2.

library(dplyr)
library(ggplot2)

R1 <- sampleData %>% filter(replicate %in% "R1") %>% select(-replicate) 
R2 <- sampleData %>% filter(replicate %in% "R2") %>% select(-replicate)
R1R2 <- R1 %>% left_join(R2, by = "sgRNA", suffix = c("_R1", "_R2"))

ggplot(R1R2, aes(x = abundance_R1, y = abundance_R2)) +
  geom_point()

有时我不认为这是关于编写额外的代码,而更多的是关于可重现性和理解所做的事情,为了你自己一段时间的道路,或者为了其他人试图理解你做了什么。

下面这个有效,但不是最直观的:

ggplot(data.frame(split(sampleData$abundance,sampleData$replicate)),
aes(x=R1,y=R2)) + geom_point()

这些不使用 tidyr 但使用其他方法将数据转换为宽格式:

1)sampleData 读入在第二列拆分的动物园对象,将其转换为数据框(它将包含列 R1R2) 并使用来自 ggplot2 的 qplot:

library(ggplot2)
library(magrittr)
library(zoo)

sampleData %>%
  read.zoo(split = 2, FUN = c) %>%
  as.data.frame %$%
  qplot(R1, R2)

2) 给出相同结果的另一种方法是使用 tapply:

library(ggplot2)
library(magrittr)

sampleData %$%
  tapply(.[[3]], .[-3], c) %>%
  as.data.frame.matrix %$% 
  qplot(R1, R2)

这也可以在没有 magrittr 的情况下写成这样:

library(ggplot2)

with(as.data.frame.matrix(tapply(sampleData[[3]], sampleData[-3], c)),
  qplot(R1, R2))