如何使用 group_by 语句在 R 中执行按行除法?

How to perform row-wise division in R, with a group_by statement?

我有以下数据框

Year    Category      TotalSales    AverageCount
1   2013    Beverages      102074.29    22190.06
2   2013    Condiments      55277.56    14173.73
3   2013    Confections     36415.75    12138.58
4   2013    Dairy Products  30337.39    24400.00
5   2013    Seafood         53019.98    27905.25
6   2014    Beverages       81338.06    35400.00
7   2014    Condiments      55948.82    19981.72
8   2014    Confections     44478.36    24710.00
9   2014    Dairy Products  84412.36    32466.00
10  2014    Seafood         65544.19    14565.37

我计算了 TotalSales 的累计总和,按以下方法按年分组

dat <-within(dat, {
  RunningTotal <- ave(dat$TotalSales, dat$Year, FUN = cumsum)
}) 

输出是这样的,

    Year    Category        TotalSales AverageCount RunningTotal
1   2013    Beverages       102074.29   22190.06    102074.29
2   2013    Condiments      55277.56    14173.73    157351.85
3   2013    Confections     36415.75    12138.58    193767.60
4   2013    Dairy Products  30337.39    24400.00    224104.99
5   2013    Seafood         53019.98    27905.25    277124.97
6   2014    Beverages       81338.06    35400.00    81338.06
7   2014    Condiments      55948.82    19981.72    137286.88
8   2014    Confections     44478.36    24710.00    181765.24
9   2014    Dairy Products  84412.36    32466.00    266177.60
10  2014    Seafood         65544.19    14565.37    331721.79

如何计算行 RunningTotal 中元素的分组比率(RunningTotal[i+1] and RunningTotal[i] 之间的比率)?

我试过使用 dplyr

中的 mutate
require(dplyr)
dat<-mutate(dat, Ratio = lag(RunningTotal)/RunningTotal)

我得到了不正确的输出(注意 NAs)

    Year    Category       TotalSales AverageCount  RunningTotal Ratio
1   2013    Beverages       102074.29   22190.06    102074.29   NA
2   2013    Condiments      55277.56    14173.73    157351.85   0.6487009
3   2013    Confections     36415.75    12138.58    193767.60   0.8120648
4   2013    Dairy Products  30337.39    24400.00    224104.99   0.8646287
5   2013    Seafood         53019.98    27905.25    277124.97   0.8086784
6   2014    Beverages       81338.06    35400.00    81338.06    NA
7   2014    Condiments      55948.82    19981.72    137286.88   0.5924678
8   2014    Confections     44478.36    24710.00    181765.24   0.7552978
9   2014    Dairy Products  84412.36    32466.00    266177.60   0.6828720
10  2014    Seafood         65544.19    14565.37    331721.79   0.8024122

如何获得如下所示的所需输出?

Year    Category       TotalSales AverageCount RunningTotal    Ratio
2013    Beverages       102074.29   22190.06    102074.29   1.5415424393
2013    Condiments      55277.56    14173.73    157351.85   1.2314288011
2013    Confections     36415.75    12138.58    193767.6    1.1565658552
2013    Dairy Products  30337.39    24400       224104.99   1.2365854504
2013    Seafood         53019.98    27905.25    277124.97   0.2935067887
2014    Beverages       81338.06    35400       81338.06    1.6878553533
2014    Condiments      55948.82    19981.72    137286.88   1.3239811408
2014    Confections     44478.36    24710       181765.24   1.4644032049
2014    Dairy Products  84412.36    32466       266177.6    1.2462423209
2014    Seafood         65544.19    14565.37    331721.79   0

示例数据:

dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L, 
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments", 
"Confections", "Dairy Products", "Seafood"), class = "factor"), 
    TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98, 
    81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06, 
    14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710, 
    32466, 14565.37)), .Names = c("Year", "Category", "TotalSales", 
"AverageCount"), class = "data.frame", row.names = c(NA, -10L
)

第一个操作的dplyr方法是:

dat <- dat %>% 
  group_by(Year) %>% 
  mutate(RunningTotal = cumsum(TotalSales)) %>% 
  ungroup

然后添加比率,使用

dat %>% 
  mutate(Ratio = c(RunningTotal[-1] / RunningTotal[-n()], 0))

尽管我很想将最后一个值设置为 NA,而不是 0。 2013 海鲜 (0.2935067887) 的比率也没有任何意义。要摆脱这种情况,您不想执行取消分组。所以像这样:

dat %>% 
  group_by(Year) %>% 
  mutate(
    RunningTotal = cumsum(TotalSales),
    Ratio = c(RunningTotal[-1] / RunningTotal[-n()], NA)
  )