如何使用 group_by 语句在 R 中执行按行除法?
How to perform row-wise division in R, with a group_by statement?
我有以下数据框
Year Category TotalSales AverageCount
1 2013 Beverages 102074.29 22190.06
2 2013 Condiments 55277.56 14173.73
3 2013 Confections 36415.75 12138.58
4 2013 Dairy Products 30337.39 24400.00
5 2013 Seafood 53019.98 27905.25
6 2014 Beverages 81338.06 35400.00
7 2014 Condiments 55948.82 19981.72
8 2014 Confections 44478.36 24710.00
9 2014 Dairy Products 84412.36 32466.00
10 2014 Seafood 65544.19 14565.37
我计算了 TotalSales 的累计总和,按以下方法按年分组
dat <-within(dat, {
RunningTotal <- ave(dat$TotalSales, dat$Year, FUN = cumsum)
})
输出是这样的,
Year Category TotalSales AverageCount RunningTotal
1 2013 Beverages 102074.29 22190.06 102074.29
2 2013 Condiments 55277.56 14173.73 157351.85
3 2013 Confections 36415.75 12138.58 193767.60
4 2013 Dairy Products 30337.39 24400.00 224104.99
5 2013 Seafood 53019.98 27905.25 277124.97
6 2014 Beverages 81338.06 35400.00 81338.06
7 2014 Condiments 55948.82 19981.72 137286.88
8 2014 Confections 44478.36 24710.00 181765.24
9 2014 Dairy Products 84412.36 32466.00 266177.60
10 2014 Seafood 65544.19 14565.37 331721.79
如何计算行 RunningTotal
中元素的分组比率(RunningTotal[i+1] and RunningTotal[i]
之间的比率)?
我试过使用 dplyr
中的 mutate
require(dplyr)
dat<-mutate(dat, Ratio = lag(RunningTotal)/RunningTotal)
我得到了不正确的输出(注意 NAs)
Year Category TotalSales AverageCount RunningTotal Ratio
1 2013 Beverages 102074.29 22190.06 102074.29 NA
2 2013 Condiments 55277.56 14173.73 157351.85 0.6487009
3 2013 Confections 36415.75 12138.58 193767.60 0.8120648
4 2013 Dairy Products 30337.39 24400.00 224104.99 0.8646287
5 2013 Seafood 53019.98 27905.25 277124.97 0.8086784
6 2014 Beverages 81338.06 35400.00 81338.06 NA
7 2014 Condiments 55948.82 19981.72 137286.88 0.5924678
8 2014 Confections 44478.36 24710.00 181765.24 0.7552978
9 2014 Dairy Products 84412.36 32466.00 266177.60 0.6828720
10 2014 Seafood 65544.19 14565.37 331721.79 0.8024122
如何获得如下所示的所需输出?
Year Category TotalSales AverageCount RunningTotal Ratio
2013 Beverages 102074.29 22190.06 102074.29 1.5415424393
2013 Condiments 55277.56 14173.73 157351.85 1.2314288011
2013 Confections 36415.75 12138.58 193767.6 1.1565658552
2013 Dairy Products 30337.39 24400 224104.99 1.2365854504
2013 Seafood 53019.98 27905.25 277124.97 0.2935067887
2014 Beverages 81338.06 35400 81338.06 1.6878553533
2014 Condiments 55948.82 19981.72 137286.88 1.3239811408
2014 Confections 44478.36 24710 181765.24 1.4644032049
2014 Dairy Products 84412.36 32466 266177.6 1.2462423209
2014 Seafood 65544.19 14565.37 331721.79 0
示例数据:
dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments",
"Confections", "Dairy Products", "Seafood"), class = "factor"),
TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98,
81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06,
14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710,
32466, 14565.37)), .Names = c("Year", "Category", "TotalSales",
"AverageCount"), class = "data.frame", row.names = c(NA, -10L
)
第一个操作的dplyr
方法是:
dat <- dat %>%
group_by(Year) %>%
mutate(RunningTotal = cumsum(TotalSales)) %>%
ungroup
然后添加比率,使用
dat %>%
mutate(Ratio = c(RunningTotal[-1] / RunningTotal[-n()], 0))
尽管我很想将最后一个值设置为 NA
,而不是 0
。 2013 海鲜 (0.2935067887
) 的比率也没有任何意义。要摆脱这种情况,您不想执行取消分组。所以像这样:
dat %>%
group_by(Year) %>%
mutate(
RunningTotal = cumsum(TotalSales),
Ratio = c(RunningTotal[-1] / RunningTotal[-n()], NA)
)
我有以下数据框
Year Category TotalSales AverageCount
1 2013 Beverages 102074.29 22190.06
2 2013 Condiments 55277.56 14173.73
3 2013 Confections 36415.75 12138.58
4 2013 Dairy Products 30337.39 24400.00
5 2013 Seafood 53019.98 27905.25
6 2014 Beverages 81338.06 35400.00
7 2014 Condiments 55948.82 19981.72
8 2014 Confections 44478.36 24710.00
9 2014 Dairy Products 84412.36 32466.00
10 2014 Seafood 65544.19 14565.37
我计算了 TotalSales 的累计总和,按以下方法按年分组
dat <-within(dat, {
RunningTotal <- ave(dat$TotalSales, dat$Year, FUN = cumsum)
})
输出是这样的,
Year Category TotalSales AverageCount RunningTotal
1 2013 Beverages 102074.29 22190.06 102074.29
2 2013 Condiments 55277.56 14173.73 157351.85
3 2013 Confections 36415.75 12138.58 193767.60
4 2013 Dairy Products 30337.39 24400.00 224104.99
5 2013 Seafood 53019.98 27905.25 277124.97
6 2014 Beverages 81338.06 35400.00 81338.06
7 2014 Condiments 55948.82 19981.72 137286.88
8 2014 Confections 44478.36 24710.00 181765.24
9 2014 Dairy Products 84412.36 32466.00 266177.60
10 2014 Seafood 65544.19 14565.37 331721.79
如何计算行 RunningTotal
中元素的分组比率(RunningTotal[i+1] and RunningTotal[i]
之间的比率)?
我试过使用 dplyr
mutate
require(dplyr)
dat<-mutate(dat, Ratio = lag(RunningTotal)/RunningTotal)
我得到了不正确的输出(注意 NAs)
Year Category TotalSales AverageCount RunningTotal Ratio
1 2013 Beverages 102074.29 22190.06 102074.29 NA
2 2013 Condiments 55277.56 14173.73 157351.85 0.6487009
3 2013 Confections 36415.75 12138.58 193767.60 0.8120648
4 2013 Dairy Products 30337.39 24400.00 224104.99 0.8646287
5 2013 Seafood 53019.98 27905.25 277124.97 0.8086784
6 2014 Beverages 81338.06 35400.00 81338.06 NA
7 2014 Condiments 55948.82 19981.72 137286.88 0.5924678
8 2014 Confections 44478.36 24710.00 181765.24 0.7552978
9 2014 Dairy Products 84412.36 32466.00 266177.60 0.6828720
10 2014 Seafood 65544.19 14565.37 331721.79 0.8024122
如何获得如下所示的所需输出?
Year Category TotalSales AverageCount RunningTotal Ratio
2013 Beverages 102074.29 22190.06 102074.29 1.5415424393
2013 Condiments 55277.56 14173.73 157351.85 1.2314288011
2013 Confections 36415.75 12138.58 193767.6 1.1565658552
2013 Dairy Products 30337.39 24400 224104.99 1.2365854504
2013 Seafood 53019.98 27905.25 277124.97 0.2935067887
2014 Beverages 81338.06 35400 81338.06 1.6878553533
2014 Condiments 55948.82 19981.72 137286.88 1.3239811408
2014 Confections 44478.36 24710 181765.24 1.4644032049
2014 Dairy Products 84412.36 32466 266177.6 1.2462423209
2014 Seafood 65544.19 14565.37 331721.79 0
示例数据:
dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments",
"Confections", "Dairy Products", "Seafood"), class = "factor"),
TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98,
81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06,
14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710,
32466, 14565.37)), .Names = c("Year", "Category", "TotalSales",
"AverageCount"), class = "data.frame", row.names = c(NA, -10L
)
第一个操作的dplyr
方法是:
dat <- dat %>%
group_by(Year) %>%
mutate(RunningTotal = cumsum(TotalSales)) %>%
ungroup
然后添加比率,使用
dat %>%
mutate(Ratio = c(RunningTotal[-1] / RunningTotal[-n()], 0))
尽管我很想将最后一个值设置为 NA
,而不是 0
。 2013 海鲜 (0.2935067887
) 的比率也没有任何意义。要摆脱这种情况,您不想执行取消分组。所以像这样:
dat %>%
group_by(Year) %>%
mutate(
RunningTotal = cumsum(TotalSales),
Ratio = c(RunningTotal[-1] / RunningTotal[-n()], NA)
)