ggplot2 - 带有地毯图的自定义直方图
ggplot2 - a custom histogram with a rug plot
我正在尝试创建一个自定义直方图,其中地毯图在 X 轴上显示原始值。
我将使用 mtcars 数据集来说明。它不是这个问题的最佳数据集...但希望 reader 能理解我想要实现的目标...
下面显示了基本直方图,没有任何地毯图尝试。
我想使用 geom_bar 创建直方图,因为这样可以更灵活地使用自定义 bin。
我还想要直方图条之间有一个小间隙(即宽度 = 0.95)....这增加了这个
问题的复杂性。
library(dplyr)
library(ggplot2)
# create custom bins
vct_seq <- c(seq(from = 10, to = 25, by = 5), 34)
mtcars$bin <- cut(mtcars$mpg, breaks = vct_seq)
# create data.frame for the ggplot graph..using bins above
df_mtcars_count <- mtcars %>% group_by(bin) %>% summarise(count = n())
# indicative labels
vct_labels <- c("bin 1", "bin 2", "bin 3", "bin 4")
# attempt 1 - basic plot -- no rug plot
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p
接下来,尝试在 X 轴上添加基本地毯图。这显然行不通,因为 geom_bar 和 geom_rug 的比例完全不同。
# attempt 2 with no scaling.... doesn't work as x scale for ordinal (bins) and
# x scale for continuous (mpg) do not match
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg), inherit.aes = F, alpha = 0.3)
p
现在,尝试重新缩放 mpg 列以匹配序号比例....
首先定义一个线性映射函数...
fn_linear_map <- function(vct_existing_val, vct_new_range) {
# example....converts 1:20 into the range 1 to 10 like this:
# fn_linear_map(1:20, c(1, 10))
fn_r_diff <- function(x) x %>% range() %>% diff()
flt_ratio <- fn_r_diff(vct_new_range) / fn_r_diff(vct_existing_val)
vct_old_min_offset <- vct_existing_val - min(vct_existing_val)
vct_new_range_val <- (vct_old_min_offset * flt_ratio) + min(vct_new_range)
return(vct_new_range_val)
}
现在应用函数...我们尝试将 mpg 映射到 1 到 4 的范围(这是尝试匹配
序数)
mtcars$mpg_remap <- fn_linear_map(mtcars$mpg, c(1, 4))
再试一次剧情……越来越近了……但不是很准确……
# attempt 3: getting closer but doesn't really match the ordinal scale
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg_remap), inherit.aes = F, alpha = 0.3)
p
上面的图表越来越接近我想要的....但是地毯图没有对齐
使用实际数据...例如,应显示最大观察值 (33.9)
几乎与栏的右侧对齐.. 见下文:
mtcars %>% filter(bin == "(25,34]") %>% arrange(mpg) %>% dplyr::select(mpg, mpg_remap)
你的比例对我来说毫无意义,因为你使用相同的条宽度显示了两倍宽的容器。结合地毯这样做让我感到困惑,最坏的情况是误导。我建议你用正确的宽度绘制条形图,之后地毯就变得微不足道了。
我认为最好的解决方案是只使用 geom_histogram
:
ggplot(mtcars, aes(mpg)) +
geom_histogram(breaks = vct_seq, col = 'grey80') +
geom_rug(aes(mpg, y = NULL))
如果您真的想要条形之间的间隙,则需要做更多的工作:
library(tidyr)
d <- mtcars %>%
count(bin) %>%
separate(bin, c('min', 'max'), sep = ',', remove = FALSE) %>%
mutate_at(vars('min', 'max'), readr::parse_number) %>%
mutate(
middle = min + (max - min) / 2,
width = 0.9 * (max - min)
)
ggplot(d, aes(middle, n)) +
geom_col(width = d$width) +
geom_rug(aes(mpg, y = NULL), mtcars)
我正在尝试创建一个自定义直方图,其中地毯图在 X 轴上显示原始值。
我将使用 mtcars 数据集来说明。它不是这个问题的最佳数据集...但希望 reader 能理解我想要实现的目标...
下面显示了基本直方图,没有任何地毯图尝试。
我想使用 geom_bar 创建直方图,因为这样可以更灵活地使用自定义 bin。
我还想要直方图条之间有一个小间隙(即宽度 = 0.95)....这增加了这个 问题的复杂性。
library(dplyr)
library(ggplot2)
# create custom bins
vct_seq <- c(seq(from = 10, to = 25, by = 5), 34)
mtcars$bin <- cut(mtcars$mpg, breaks = vct_seq)
# create data.frame for the ggplot graph..using bins above
df_mtcars_count <- mtcars %>% group_by(bin) %>% summarise(count = n())
# indicative labels
vct_labels <- c("bin 1", "bin 2", "bin 3", "bin 4")
# attempt 1 - basic plot -- no rug plot
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p
接下来,尝试在 X 轴上添加基本地毯图。这显然行不通,因为 geom_bar 和 geom_rug 的比例完全不同。
# attempt 2 with no scaling.... doesn't work as x scale for ordinal (bins) and
# x scale for continuous (mpg) do not match
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg), inherit.aes = F, alpha = 0.3)
p
现在,尝试重新缩放 mpg 列以匹配序号比例....
首先定义一个线性映射函数...
fn_linear_map <- function(vct_existing_val, vct_new_range) {
# example....converts 1:20 into the range 1 to 10 like this:
# fn_linear_map(1:20, c(1, 10))
fn_r_diff <- function(x) x %>% range() %>% diff()
flt_ratio <- fn_r_diff(vct_new_range) / fn_r_diff(vct_existing_val)
vct_old_min_offset <- vct_existing_val - min(vct_existing_val)
vct_new_range_val <- (vct_old_min_offset * flt_ratio) + min(vct_new_range)
return(vct_new_range_val)
}
现在应用函数...我们尝试将 mpg 映射到 1 到 4 的范围(这是尝试匹配 序数)
mtcars$mpg_remap <- fn_linear_map(mtcars$mpg, c(1, 4))
再试一次剧情……越来越近了……但不是很准确……
# attempt 3: getting closer but doesn't really match the ordinal scale
p <- ggplot(data = df_mtcars_count, aes(x = bin, y = count))
p <- p + geom_bar(stat = "identity", width = 0.95)
p <- p + geom_text(aes(label = count), vjust = -0.5)
p <- p + scale_x_discrete("x title to go here", labels = df_mtcars_count$bin, breaks = df_mtcars_count$bin)
p <- p + geom_rug(data = mtcars, aes(x = mpg_remap), inherit.aes = F, alpha = 0.3)
p
上面的图表越来越接近我想要的....但是地毯图没有对齐 使用实际数据...例如,应显示最大观察值 (33.9) 几乎与栏的右侧对齐.. 见下文:
mtcars %>% filter(bin == "(25,34]") %>% arrange(mpg) %>% dplyr::select(mpg, mpg_remap)
你的比例对我来说毫无意义,因为你使用相同的条宽度显示了两倍宽的容器。结合地毯这样做让我感到困惑,最坏的情况是误导。我建议你用正确的宽度绘制条形图,之后地毯就变得微不足道了。
我认为最好的解决方案是只使用 geom_histogram
:
ggplot(mtcars, aes(mpg)) +
geom_histogram(breaks = vct_seq, col = 'grey80') +
geom_rug(aes(mpg, y = NULL))
如果您真的想要条形之间的间隙,则需要做更多的工作:
library(tidyr)
d <- mtcars %>%
count(bin) %>%
separate(bin, c('min', 'max'), sep = ',', remove = FALSE) %>%
mutate_at(vars('min', 'max'), readr::parse_number) %>%
mutate(
middle = min + (max - min) / 2,
width = 0.9 * (max - min)
)
ggplot(d, aes(middle, n)) +
geom_col(width = d$width) +
geom_rug(aes(mpg, y = NULL), mtcars)