仅根据 R 中另一个变量的 'yes' 值按比例对 ggplot 中的直方图进行排序

Question

我有这样的数据

df <- data.frame (
cancer = c(1, 0, 1, 0, 0, 1, 0, 0, 0, 0),
CVD =    c(0, 1, 1, 0, 1, 0, 0, 0, 0, 0),
diab =   c(0, 0, 0, 1, 0, 1, 0, 0, 1, 0),
stroke = c(0, 1, 1, 0, 1, 0, 0, 0, 1, 0),
asthma = c(1, 1, 1, 0, 1, 1, 0, 0, 0, 0),
SR_hlt = c(1, 2, 2, 2, 1, 1, 2, 2, 2, 1))

我想做的是制作一个条形图，只针对患有感兴趣疾病的人，其中条形图的条形按 SR_hlt == 1 的人的比例排序.

为了制作这个情节，我使用了以下代码

1) 收集数据

df_grp <- df %>%
gather(key = condition, value = Y_N, -SR_hlt) %>%
group_by(condition, Y_N, SR_hlt) %>%
summarise(count = n()) %>%
mutate(freq = round(count/sum(count) * 100, digits = 1))

2) 绘制此数据

df_plot <- df_grp  %>%
filter(Y_N == 1) %>%
ggplot(aes(x = reorder(condition, -freq), y = freq, fill = factor(SR_hlt)), width=0.5) +
geom_bar(stat="identity", position = position_dodge(0.9))
df_plot

x = reorder(condition, -freq) 应该是对条形进行排序的东西，但我认为这在这种情况下不起作用，因为频率值取决于第三个变量的值，SR_hlt.当 SR_hlt 的值 == 1 时，是否可以按 freq 的值对条形图进行排序？

Answer 1

这可以使用方便的包 forcats 来完成，特别是 fct_reorder2

df_plot <- df_grp  %>%
  filter(Y_N == 1) %>%
  ggplot(aes(x = fct_reorder2(condition, SR_hlt, -freq), 
             y = freq, fill = factor(SR_hlt)), width=0.5) +
  geom_bar(stat="identity", position = position_dodge(0.9))
df_plot

这里设置了condition作为一个因子，由于SR_hlt == 1是感兴趣的，所以我们从低到高排列SR_hlt，然后是-freq，或者从高到低 freq.

或者，您可以在 ggplot 调用之前仅使用标准 dplyr 设置因子：

df_plot <- df_grp  %>%
  ungroup() %>% 
  filter(Y_N == 1) %>%
  arrange(SR_hlt, desc(freq)) %>% 
  mutate(condition = factor(condition, unique(condition))) %>% 
  ggplot(aes(x = condition, y = freq, fill = factor(SR_hlt)), width=0.5) +
  geom_bar(stat="identity", position = position_dodge(0.9))
df_plot

在上面，我使用 arrange 对 SR_hlt 的最高 freq 数据帧进行排序。接下来，我使用 mutate 通过按出现顺序分解 condition 来利用排序后的数据帧。

仅根据 R 中另一个变量的 'yes' 值按比例对 ggplot 中的直方图进行排序

Order histograms in ggplot by proportion depending only on the 'yes' value of another variable in R

r

ggplot2

geom-bar