用 ggplot2 直方图中另一个连续变量的平均值填充条形颜色

Question

我有一个市政级别的数据集。我想绘制给定变量的直方图，同时用另一个连续变量（使用颜色渐变）填充条形图。这是因为我相信，与分布上端的城市相比，我绘制直方图的变量值较低的城市的人口规模（平均而言）非常不同。

使用 mtcar 数据，假设我想绘制 mpg 的分布，并用连续颜色填充条形图以表示每个变量 wt 的平均值的直方图条。我输入了下面的代码，但我不知道如何让 fill 选项取 wt 的平均值。我希望图例显示颜色渐变，以便告知每个直方图条的 wt 的平均值相对而言是否为低-中-高。

  mtcars %>% 
  ggplot(aes(x=mpg, fill=wt)) +
  geom_histogram()

Answer 1

它不完全是直方图，但它是我能想到的最接近你问题的直方图

library(tidyverse)

mtcars %>%
  #Create breaks for mpg, where this sequence is just an example
  mutate(mpg_cut = cut(mpg,seq(10,35,5))) %>% 
  #Count and mean of wt by mpg_cut
  group_by(mpg_cut) %>% 
  summarise(
    n = n(),
    wt = mean(wt)
  ) %>% 
  ggplot(aes(x=mpg_cut, fill=wt)) +
  #Bar plot 
  geom_col(aes(y = n), width = 1)

Answer 2

如果您想要一个真正的直方图，您需要先对数据进行汇总，然后使用 geom_col 而不是 geom_histogram 来转换数据以实现此目的。基本 R 函数 hist 将帮助您在这里生成中断点和中点：

library(ggplot2)
library(dplyr)

mtcars %>% 
  mutate(mpg = cut(x      = mpg, 
                   breaks = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$breaks,
                   labels = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$mids)) %>%
  group_by(mpg) %>%
  summarize(n = n(), wt = mean(wt)) %>%
  ggplot(aes(x = as.numeric(as.character(mpg)), y = n, fill = wt)) +
  scale_x_continuous(limits = c(0, 40), name = "mpg") +
  geom_col(width = 10) +
  theme_bw()

用 ggplot2 直方图中另一个连续变量的平均值填充条形颜色

Filling bar colours with the mean of another continuous variable in ggplot2 histograms

r

histogram

ggplot2