ggplot2 中的 cdplot() 模拟

Question

我正在寻找条件密度图，如 R 的内置 cdplot 函数，但使用 ggplot2.

这是一个原版 cdplot 示例：

with(iris, cdplot(Sepal.Length, Species))

在 the ggplot2 book（第 188 页）中，它表示以下调用应该等效：

cdplot(x, y)
qplot(x, fill=y, geom="density", position="fill")

但是，似乎该行为在 ggplot2 的某些更新中中断（它还发出警告说 `position` is deprecated）：

with(iris, qplot(Sepal.Length, fill=Species, geom="density", position="fill"))

我找到了 a blog entry of someone trying to do the same thing，但显然它现在也坏了（同样的警告，`position` is deprecated）：

cdens <- cdplot(iris$Sepal.Length, iris$Species, plot = F)
x <- seq(min(iris$Sepal.Length), max(iris$Sepal.Length), length.out = 100)
y <- c(cdens[[1]](x), cdens[[2]](x), rep(1, length(x)))
type <- ordered(rep(levels(iris$Species), each = length(x)),
                levels=rev(levels(iris$Species)))
x <- rep(x, 3)
qplot(x, y, geom="area", fill = type, position="identity",
      xlab="Sepal.Length", ylab="Species") + theme_bw()

有什么方法可以做到这一点？这些例子中有什么问题？

（我想要一个 ggplot 解决方案，因为它有更好的轴标签和图例，尤其是当自变量是日期时。）

更新： 在下面的评论中，@bouncyball 建议使用 ggplot(iris, aes(x = Sepal.Length, fill = Species))+ geom_density(position = 'fill')，但这样做有所不同：

with(data, cdplot(time, cat))
abline(v=as.POSIXct(c('2017-04-01', '2017-03-01')), col='red')

ggplot(data, aes(x=time, fill=cat)) + geom_density(position = 'fill')

cdplot 结果是我想要的，我不确定 ggplot 示例在做什么。 cdplot 结果与因子比率相匹配，例如 2017 年 3 月：

> with(subset(data, time>'2017-03-01' & time <'2017-04-01'), table(cat))
cat
   <1s    <3s    <5s   <10s   <20s    <1m    <2m    <5m    <1h   <24h    >1d 
175484  31837  19078  16146  15013  20200   1142   1207    944     17      0

Answer 1

不确定是否比这更复杂，但您可以将 position_fill 与 geom_density 结合使用。这里有两个版本，一个带有通常的填充图例，另一个带有标签，每个物种的最大值为 Sepal.Length。您可以以不同的方式设置标签，或者跳过它们——我只是想有点像镜像 cdplot.

的设置

library(tidyverse)

iris %>%
  ggplot(aes(x = Sepal.Length, fill = Species)) +
    geom_density(position = position_fill(), size = 0) +
    theme_bw() +
    scale_fill_brewer(palette = "Set2") +
    scale_x_continuous(expand = expand_scale(0)) +
    scale_y_continuous(expand = expand_scale(0))

lbls <- iris %>%
  group_by(Species) %>%
  summarise(max_sl = max(Sepal.Length))

iris %>%
  ggplot(aes(x = Sepal.Length, fill = Species)) +
  geom_density(position = position_fill(), size = 0) +
  geom_text(aes(x = max_sl, y = 1, label = Species), data = lbls, hjust = 1, vjust = 1, nudge_y = -0.02, nudge_x = -0.05, color = "white", fontface = "bold") +
  theme_bw() +
  scale_fill_brewer(palette = "Set2", guide = F) +
  scale_x_continuous(expand = expand_scale(0)) +
  scale_y_continuous(expand = expand_scale(0))

Answer 2

使用计算变量 count 绘制堆积密度图，并根据 cdplot 重新排序物种水平。

library(ggplot2)
ggplot(iris, aes(Sepal.Length, ..count.., fill = forcats::fct_relevel(Species, 
  levels = c("virginica", "versicolor", "setosa")))) +
  geom_density(position = "fill") +
  labs(fill = "Species")

ggplot2 中的 cdplot() 模拟

cdplot() analog in ggplot2

r

distribution

ggplot2