hist 列依赖于其他列的相对频率 (R)

hist column in dependency of the relative frequency of other column (R)

假设我们有 table:

x  y  
1  43
1  54
2  54   
3  22
2  22
1  43

我只想在 x 轴上显示 1、2、3,这样它可以识别唯一值,但除此之外,它还应该以 % 显示数字 43 在 1、54 中出现的频率,依此类推。两列都应该因式分解吗?

这是我的解决方案:

library("ggplot2")
library("dplyr")
library("magrittr")
library("tidyr")

df <- data.frame(x = c(1,1,2,3,2,1), y = c(43,54,54,22,22,43))

#Creating a counter that will keep track
#Of how many of each number in y exist for each x category
df$n <- 1
df %<>% #This is a bidirectional pipe here that overwrites 'df' with the result!
  group_by(x, y) %>% #Unidirectional pipe
  tally(n) %>%
  mutate(n = round(n/sum(n), 2)) #Calculating as percentage

#Plotting
df %>% 
  ggplot(aes(fill = as.factor(y), y = n, x = x)) + 
  geom_bar(position = "fill", stat = "identity") + 
  scale_y_continuous(labels = scales::percent) +
  labs(y = "Percentage contribution from each y category") + 
  #Adding the percentage values as labels
  geom_text(aes(label = paste0(n*100,"%")), position = position_stack(vjust = 0.5), size = 2)

注意:y 轴值以百分比表示,因为 position="fill" 传递给 geom_bar()