ggplot2 - 带有地毯图的自定义直方图

ggplot2 - a custom histogram with a rug plot

我正在尝试创建一个自定义直方图,其中地毯图在 X 轴上显示原始值。

我将使用 mtcars 数据集来说明。它不是这个问题的最佳数据集...但希望 reader 能理解我想要实现的目标...

下面显示了基本直方图,没有任何地毯图尝试。

我想使用 geom_bar 创建直方图,因为这样可以更灵活地使用自定义 bin。

我还想要直方图条之间有一个小间隙(即宽度 = 0.95)....这增加了这个 问题的复杂性。

library(dplyr)
library(ggplot2)

# create custom bins
vct_seq <- c(seq(from = 10, to = 25, by = 5), 34)
mtcars$bin <- cut(mtcars$mpg, breaks = vct_seq)

# create data.frame for the ggplot graph..using bins above
df_mtcars_count <- mtcars %>% group_by(bin) %>% summarise(count = n())

# indicative labels
vct_labels <- c("bin 1", "bin 2", "bin 3", "bin 4")

# attempt 1 - basic plot -- no rug plot
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p

接下来,尝试在 X 轴上添加基本地毯图。这显然行不通,因为 geom_bar 和 geom_rug 的比例完全不同。

# attempt 2 with no scaling.... doesn't work as x scale for ordinal (bins) and 
# x scale for continuous (mpg) do not match
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg), inherit.aes = F, alpha = 0.3)
p

现在,尝试重新缩放 mpg 列以匹配序号比例....

首先定义一个线性映射函数...

fn_linear_map <- function(vct_existing_val, vct_new_range) {
  # example....converts 1:20 into the range 1 to 10 like this:
  # fn_linear_map(1:20, c(1, 10))
  fn_r_diff <- function(x) x %>% range() %>% diff()
  flt_ratio <- fn_r_diff(vct_new_range) / fn_r_diff(vct_existing_val)
  vct_old_min_offset <- vct_existing_val  - min(vct_existing_val)
  vct_new_range_val <- (vct_old_min_offset * flt_ratio) + min(vct_new_range)
  return(vct_new_range_val)
}

现在应用函数...我们尝试将 mpg 映射到 1 到 4 的范围(这是尝试匹配 序数)

mtcars$mpg_remap <- fn_linear_map(mtcars$mpg, c(1, 4))

再试一次剧情……越来越近了……但不是很准确……

# attempt 3:  getting closer but doesn't really match the ordinal scale
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg_remap), inherit.aes = F, alpha = 0.3)
p

上面的图表越来越接近我想要的....但是地毯图没有对齐 使用实际数据...例如,应显示最大观察值 (33.9) 几乎与栏的右侧对齐.. 见下文:

mtcars %>% filter(bin == "(25,34]") %>% arrange(mpg) %>% dplyr::select(mpg, mpg_remap)

你的比例对我来说毫无意义,因为你使用相同的条宽度显示了两倍宽的容器。结合地毯这样做让我感到困惑,最坏的情况是误导。我建议你用正确的宽度绘制条形图,之后地毯就变得微不足道了。

我认为最好的解决方案是只使用 geom_histogram:

ggplot(mtcars, aes(mpg)) + 
  geom_histogram(breaks = vct_seq, col = 'grey80') +
  geom_rug(aes(mpg, y = NULL))

如果您真的想要条形之间的间隙,则需要做更多的工作:

library(tidyr)
d <- mtcars %>% 
  count(bin) %>% 
  separate(bin, c('min', 'max'), sep = ',', remove = FALSE) %>% 
  mutate_at(vars('min', 'max'), readr::parse_number) %>% 
  mutate(
    middle = min + (max - min) / 2,
    width = 0.9 * (max - min)
  )

ggplot(d, aes(middle, n)) + 
  geom_col(width = d$width) +
  geom_rug(aes(mpg, y = NULL), mtcars)