在不同页面上绘制多个变量的箱线图

Question

绘制大数据集的箱线图（在一个运行中）

我如何从一次抛出 280 个箱线图的代码传递到一个分隔每个变量并获得 280 个不同图的代码？

我发现的所有示例都包含少于 5 个变量， 使输出易于处理和识别...但是如果超过 15 个，您如何处理要绘制的变量？

我有一个大数据集（长格式= 77560 个观察值，3 个变量；宽格式= 280 个观察值，278 个变量）。该数据集包含来自 2 组患者和对照组的临床数据和测量值。

我的目标是从所有 280 个变量中对患者进行箱线图控制，并在不同的图中获得结果（不同的结果 windows）。

我想要一个代码，而不是这样做 280 次。

280张箱线图如何输出更合理？

谢谢！

我使用的代码是这样的：

ggplot(long_df, aes(x=variable, y=value)) + geom_boxplot(aes(fill=group))

这是 15 列中的 4 行的样子：

    df <- structure(list(group = c("control", "control", "patient", 
"patient"), `Scale factor` = c(0.80696, 0.8002, 0.73286, 0.83765
), SNR = c(19.1027, 17.8508, 19.2552, 15.002), mSNR = c(20.2588, 
18.9367, 20.1892, 16.1166), `ICV cm3` = c(1461.351, 1426.9219, 
1350.5229, 1565.7709), `Cerebellum total cm3` = c(128.4798, 125.1114, 
124.4808, 143.9827), `Cerebellum right cm3` = c(64.2286, 62.7666, 
62.0081, 71.7966), `Cerebellum left cm3` = c(64.2512, 62.3449, 
62.4727, 72.1861), `Cerebellum total %` = c(8.7919, 8.7679, 9.2172, 
9.1956), `Cerebellum right %` = c(4.3952, 4.3987, 4.5914, 4.5854
), `Cerebellum left %` = c(4.3967, 4.3692, 4.6258, 4.6103), `Cerebellum asymmetry` = c(-0.035173, 
0.67412, -0.74651, -0.54105), `I-II total cm3` = c(0.11782, 0.10723, 
0.090875, 0.13486), `I-II right cm3` = c(0.058101, 0.056814, 
0.043239, 0.069525), `I-II left cm3` = c(0.059715, 0.050412, 
0.047636, 0.065337)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

从宽到长：

long_df <- melt(df, id.var="group")

我怎么能从这里过去

... 到 280 个像这样的常规箱线图 ...

Answer 1

一种选择是使用数据标准化。这意味着您将值重新计算为 -1 和 1 之间的值，但保持相对差异。

df <- df %>% mutate_at(vars(-group), ~(scale(.) %>% as.vector))
long_df <- data.table::melt(df, id.var="group")
ggplot(long_df, aes(x=variable, y=value)) + geom_boxplot(aes(fill=group)) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Answer 2

使用 FOR 循环 facet_wrap_paginate() 循环

（另外，其他方面 facet_wrap_ 选项可能会起作用）

注意，数据框必须是长版的(long_df)

## THE BOXPLOT
############
for(i in 1:n) {  # n= the total number of pages
  plots <- long_df %>% 
    ggplot(aes(group, value))+
    geom_boxplot(aes(color = group)) +
    geom_jitter( width = 0.10)+
    stat_compare_means(method = "t.test", paired = FALSE) + #adding a t-test
    facet_wrap_paginate(~ variable, ncol = 3, nrow = 2, page = i, scales='free')
    print(plots)
}

## FINISH ##

结果示例：

在不同页面上绘制多个变量的箱线图

Boxploting many variables over different pages

r

bigdata

ggplot2

boxplot

绘制大数据集的箱线图（在一个运行中）

使用 FOR 循环 facet_wrap_paginate() 循环

在不同页面上绘制多个变量的箱线图

Boxploting many variables over different pages

r

bigdata

ggplot2

boxplot

绘制大数据集的箱线图（在一个 运行 中）

使用 FOR 循环 facet_wrap_paginate() 循环

绘制大数据集的箱线图（在一个运行中）