R中按组分别计算总和
calculation sum separately by groups in R
说,我有数据集。
df=structure(list(ItemRelation = c(13250L, 13250L, 13250L, 13250L,
13250L, 13250L, 13250L, 13250L, 13250L, 13250L, 13250L, 13250L,
1300L, 1300L, 1300L, 1300L, 1300L, 1300L, 1300L, 1300L, 1300L,
1300L, 1300L, 1300L), SaleCount = c(354L, 679L, 397L, 473L, 614L,
404L, 127L, 434L, 786L, 127L, 434L, 786L, 354L, 679L, 397L, 473L,
614L, 404L, 127L, 434L, 786L, 127L, 434L, 786L), DocumentNum = c(336L,
336L, 336L, 336L, 336L, 336L, 336L, 336L, 336L, 336L, 336L, 336L,
335L, 335L, 335L, 335L, 335L, 335L, 335L, 335L, 335L, 335L, 335L,
335L), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L)), .Names = c("ItemRelation",
"SaleCount", "DocumentNum", "IsPromo"), class = "data.frame", row.names = c(NA,
-24L))
有变量 ispromo。它只需要值 0 和 1。
所以,我必须分别为每个组计算销售计数的总和,但仅限于 1 类 ispromo。
组是 ItemRelation +SaleCount +DocumentNum
我该怎么做?
期望的输出
ItemRelation DocumentNum sum1
13250 336 1347
1300 335 1347
使用 dplyr:
library(dplyr)
df %>%
group_by(ItemRelation, DocumentNum) %>%
filter(IsPromo == 1) %>%
summarise(sum1 = sum(SaleCount))
# A tibble: 2 x 3
# Groups: ItemRelation [?]
ItemRelation DocumentNum sum1
<int> <int> <int>
1 1300 335 1347
2 13250 336 1347
这是使用 aggregate
的互补基础 R 解决方案
aggregate(SaleCount ~ ItemRelation + DocumentNum, subset(df, IsPromo == 1), sum)
# ItemRelation DocumentNum SaleCount
#1 1300 335 1347
#2 13250 336 1347
说,我有数据集。
df=structure(list(ItemRelation = c(13250L, 13250L, 13250L, 13250L,
13250L, 13250L, 13250L, 13250L, 13250L, 13250L, 13250L, 13250L,
1300L, 1300L, 1300L, 1300L, 1300L, 1300L, 1300L, 1300L, 1300L,
1300L, 1300L, 1300L), SaleCount = c(354L, 679L, 397L, 473L, 614L,
404L, 127L, 434L, 786L, 127L, 434L, 786L, 354L, 679L, 397L, 473L,
614L, 404L, 127L, 434L, 786L, 127L, 434L, 786L), DocumentNum = c(336L,
336L, 336L, 336L, 336L, 336L, 336L, 336L, 336L, 336L, 336L, 336L,
335L, 335L, 335L, 335L, 335L, 335L, 335L, 335L, 335L, 335L, 335L,
335L), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L)), .Names = c("ItemRelation",
"SaleCount", "DocumentNum", "IsPromo"), class = "data.frame", row.names = c(NA,
-24L))
有变量 ispromo。它只需要值 0 和 1。 所以,我必须分别为每个组计算销售计数的总和,但仅限于 1 类 ispromo。 组是 ItemRelation +SaleCount +DocumentNum
我该怎么做?
期望的输出
ItemRelation DocumentNum sum1
13250 336 1347
1300 335 1347
使用 dplyr:
library(dplyr)
df %>%
group_by(ItemRelation, DocumentNum) %>%
filter(IsPromo == 1) %>%
summarise(sum1 = sum(SaleCount))
# A tibble: 2 x 3
# Groups: ItemRelation [?]
ItemRelation DocumentNum sum1
<int> <int> <int>
1 1300 335 1347
2 13250 336 1347
这是使用 aggregate
aggregate(SaleCount ~ ItemRelation + DocumentNum, subset(df, IsPromo == 1), sum)
# ItemRelation DocumentNum SaleCount
#1 1300 335 1347
#2 13250 336 1347