如何使用 tidyverse 参考我的 table 中的一个数字(在 R 中为 100%)获取百分比?
How to get Percentages with reference to one number in my table which is 100 % in R, with tidyverse?
我正在尝试使用 tidyverse 在 R 中获得 Percentages
:
- 基于第一行
NSTEMI
,无论其计数如何(与以下类别相比)is always 100 %
。
- 但我希望我的百分比
'Seen by Cardiologist', 'Admitted to Cardiac Unit or Ward', 'Eligible for Angiography'
是参考 NSTEMI
计算的。在外行语言中听起来像 - out of NSTEMI how much percentage have been seen by the cardiologist, admitted to cardiac unit and are eligible for angiography
;
- 然而,最后两行
'Underwent Angiography' and 'Underwent Angiography Before Discharge'
和their Percentages
要参考'Eligible for Angiography'
其中is my fourth row
来计算。就像来自 'Eligible for Angiography' how many 'Underwent Angiography' and 'Underwent angiography before discharge'
这是我的数据样本:
structure(list(Category = c("NSTEMI", "Seen by cardiologist",
"Admitted to cardiac unit or Ward", "Eligible for angiography",
"Underwent angiography", "Underwent angiography before discharge"
), Counts = c(196L, 196L, 158L, 174L, 174L, 173L)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
我想要的 table 示例如下图所示:
据观察,我想要的table是根据我上面设计的原则完成的。
这项特殊任务并不是 tidyverse
的真实案例。但你可以使用:
library(dplyr)
df %>%
group_by(grp = c(rep(1,4), rep(2,2))) %>%
mutate(Frequency = Counts/first(Counts),
Percentage = paste0(round(Frequency * 100, 2), " %")) %>%
ungroup() %>%
select(-grp)
哪个returns
# A tibble: 6 x 4
Category Counts Frequency Percentage
<chr> <int> <dbl> <chr>
1 NSTEMI 196 1 100 %
2 Seen by cardiologist 196 1 100 %
3 Admitted to cardiac unit or Ward 158 0.806 80.61 %
4 Eligible for angiography 174 0.888 88.78 %
5 Underwent angiography 174 1 100 %
6 Underwent angiography before discharge 173 0.994 99.43 %
此解决方案基于您 Category
的位置,如有必要,您可以使用字符串比较。
我创建了一个分组变量:您数据的前四行属于第 1 组,最后两行属于第 2 组。c(rep(1,4), rep(2,2))
创建该分组:1 1 1 1 2 2
,1 重复四次, 2 两次。 first
-函数获取组的第一个元素。由于分组,组 1 的第一个元素取自行 NSTEMI
,组 2 的第一个元素取自行 Underwent angiography
.
我正在尝试使用 tidyverse 在 R 中获得 Percentages
:
- 基于第一行
NSTEMI
,无论其计数如何(与以下类别相比)is always 100 %
。 - 但我希望我的百分比
'Seen by Cardiologist', 'Admitted to Cardiac Unit or Ward', 'Eligible for Angiography'
是参考NSTEMI
计算的。在外行语言中听起来像 -out of NSTEMI how much percentage have been seen by the cardiologist, admitted to cardiac unit and are eligible for angiography
; - 然而,最后两行
'Underwent Angiography' and 'Underwent Angiography Before Discharge'
和their Percentages
要参考'Eligible for Angiography'
其中is my fourth row
来计算。就像来自'Eligible for Angiography' how many 'Underwent Angiography' and 'Underwent angiography before discharge'
这是我的数据样本:
structure(list(Category = c("NSTEMI", "Seen by cardiologist",
"Admitted to cardiac unit or Ward", "Eligible for angiography",
"Underwent angiography", "Underwent angiography before discharge"
), Counts = c(196L, 196L, 158L, 174L, 174L, 173L)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
我想要的 table 示例如下图所示:
据观察,我想要的table是根据我上面设计的原则完成的。
这项特殊任务并不是 tidyverse
的真实案例。但你可以使用:
library(dplyr)
df %>%
group_by(grp = c(rep(1,4), rep(2,2))) %>%
mutate(Frequency = Counts/first(Counts),
Percentage = paste0(round(Frequency * 100, 2), " %")) %>%
ungroup() %>%
select(-grp)
哪个returns
# A tibble: 6 x 4
Category Counts Frequency Percentage
<chr> <int> <dbl> <chr>
1 NSTEMI 196 1 100 %
2 Seen by cardiologist 196 1 100 %
3 Admitted to cardiac unit or Ward 158 0.806 80.61 %
4 Eligible for angiography 174 0.888 88.78 %
5 Underwent angiography 174 1 100 %
6 Underwent angiography before discharge 173 0.994 99.43 %
此解决方案基于您 Category
的位置,如有必要,您可以使用字符串比较。
我创建了一个分组变量:您数据的前四行属于第 1 组,最后两行属于第 2 组。c(rep(1,4), rep(2,2))
创建该分组:1 1 1 1 2 2
,1 重复四次, 2 两次。 first
-函数获取组的第一个元素。由于分组,组 1 的第一个元素取自行 NSTEMI
,组 2 的第一个元素取自行 Underwent angiography
.