R中分组计算的新变量

New variable from grouped calculation in R

我有一个数据集:

library(dplyr)
my_df <- data.frame(day = c(1,1,1,2,2,2,3,3,3), age = c(18, 18, 18, 25, 18, 35, 76, 76, 15))
my_df
#   day age
# 1   1  18
# 2   1  18
# 3   1  18
# 4   2  25
# 5   2  18
# 6   2  35
# 7   3  76
# 8   3  76
# 9   3  15

对于每一行,我想知道给定日期值的频率和年龄百分比。例如,我可以用 dplyr 链来计算:

my_df %>%
  group_by(day, age) %>%
  summarize(n=n()) %>%
  group_by(day) %>%
  mutate(pct = n/sum(n))
#     day   age    n   pct
# 1     1    18    3   1    
# 2     2    18    1   0.333
# 3     2    25    1   0.333
# 4     2    35    1   0.333
# 5     3    15    1   0.333
# 6     3    76    2   0.667

如何将 n 个值的值添加回我的原始 df?期望的输出:

#   day age  n
# 1   1  18  3
# 2   1  18  3
# 3   1  18  3
# 4   2  25  1
# 5   2  18  1
# 6   2  35  1
# 7   3  76  2
# 8   3  76  2
# 9   3  15  1

我会将其存储为变量,如下所示:

my_helper_df <- my_df %>%
  group_by(day, age) %>%
  summarize(n=n()) %>%
  group_by(day) %>%
  mutate(pct = n/sum(n))

然后left_join到原来的df,这样:

final_df <- dplyr::left_join(df, my_helper_df, by = c("day", "age"))

对于您想要的输出,我们可以使用 add_count()

library(dplyr)
my_df %>% 
  add_count(day, age)
  day age n
1   1  18 3
2   1  18 3
3   1  18 3
4   2  25 1
5   2  18 1
6   2  35 1
7   3  76 2
8   3  76 2
9   3  15 1