直方图未显示正确 count/values？（直方图与 Geom Freqpoly）

Question

我有一个 2002 年纽约马拉松赛和每个人的位置的数据集。我还有每个人的性别。

当我绘制直方图时，按性别分组，女性的计数不对！

当我绘制 FreqPoly 图时，分布与基于数据的预期一致。

谁能解释一下这种差异？红色条代表女性，蓝色条代表男性。相同的颜色适用于 freq_poly 图。

红线是女性赛车手的计数应该在的位置，但直方图显示它们的值要高得多。为什么？

Answer 1

为了详细说明 teunbrand 在评论中所说的内容，问题是您的直方图条彼此堆叠在一起。这是因为 geom_histogram 的默认位置参数是 position = "stack"。这与默认为 position = "identity".

的 geom_freqpoly 相反

因此，您需要做的就是添加 position = "identity":

data(nym.2002, package = "UsingR")
ggplot(nym.2002, aes(x = place)) + 
  geom_freqpoly(aes(color = gender)) + 
  geom_histogram(aes(fill = gender),
                 alpha = 0.2,
                 position = "identity")

如果您查看 help(geom_freqpoly)，您可以找到适合自己的默认参数。

Answer 2

不是答案，而是 Ian Campbell 和 teunbrand 的答案中讨论的不同职位选项的可视化


library(ggplot2)
set.seed(1)
p1 <- ggplot()+
  geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)), position = "dodge")+
  ggtitle("position = dodge")

set.seed(1)
p2 <- ggplot()+
  geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)), position = "identity")+
  ggtitle("position = identity")

set.seed(1)
p3 <- ggplot()+
  geom_histogram(data = data.frame(x = rnorm(100), g = rep(1:2, 50)), aes(x, fill = factor(g)))+
  ggtitle("position = stack")


library(patchwork)

p1/p2/p3
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

^{由 reprex package (v0.3.0)}

于 2020-07-11 创建

直方图未显示正确 count/values？ （直方图与 Geom Freqpoly）

Histogram not showing correct count/values? (Histogram vs Geom Freqpoly)

r

histogram

ggplot2

直方图未显示正确 count/values？（直方图与 Geom Freqpoly）