如何在 y 轴上显示个人百分比而不是按组在直方图中计数？

Question

我有这样一个数据框：

> head(a)
         FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
1 fam1000-03 G1000      1      1      38  10.2    1 control
2 fam1001-03 G1001      1      1      15   7.3    1 control
3 fam1003-03 G1003      1      2      17   7.0    1    case
4 fam1005-03 G1005      1      1      36   7.7    1 control
5 fam1009-03 G1009      1      1      23   7.6    1 control
6 fam1052-03 G1052      1      1      32   7.3    1 control

我的 df 有 1698 个 obs，其中 828 个在 pheno 列中有 "case"，836 个在 pheno 列中有 "control"。

我通过以下方式制作直方图：

library(ggplot2)
ggplot(a, aes(x=HBA1C, fill=pheno)) + 
  geom_histogram(binwidth=.5, position="dodge")

我想让 y 轴显示个人的百分比在 pheno 中有 "case" 或 "control" 而不是计数。因此，将为每个组计算 y 轴上的百分比（"case" 或 "control"）。我的情节中也确实有 NA，最好将它们从情节中排除。

我想我可以用这个从 pheno 中去除 NAs:

ggplot(data=subset(a, !is.na(pheno)), aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, position="dodge")

Answer 1

可以这样实现：

注意：关于 NA，您是对的。对于非 NA 值，只需 subset 或使用 dplyr::filter 或 ...

a <- read.table(text = "id FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
1 fam1000-03 G1000      1      1      38  10.2    1 control
2 fam1001-03 G1001      1      1      15   7.3    1 control
3 fam1003-03 G1003      1      2      17   7.0    1    case
4 fam1005-03 G1005      1      1      36   7.7    1 control
5 fam1009-03 G1009      1      1      23   7.6    1 control
6 fam1052-03 G1052      1      1      32   7.3    1 control
                7 fam1052-03 G1052      1      1      32   7.3    1 NA", header = TRUE)

library(ggplot2)

ggplot(a, aes(x=HBA1C, fill=pheno)) + 
  geom_histogram(aes(y = ..count.. / tapply(..count.., ..group.., sum)[..group..]),
                 position='dodge', binwidth=0.5) +
  scale_y_continuous(labels = scales::percent)

^{由 reprex package (v0.3.0)}

于 2020-05-23 创建

如何在 y 轴上显示个人百分比而不是按组在直方图中计数？

How to show percentage of individuals on y axis instead of count in histogram by groups?

histogram

ggplot2