在多个条件下在 R 中求和
Summing in R with multiple conditions
我正在尝试按年份对每个国家/地区的第 4 列(儿童)、第 5 列(成人)和第 6 列(老年人)和 return 值求和,忽略第 3 列(性别)。通过各种论坛阅读我无法结合这些:
country year sex child adult elderly
1 Afghanistan 1995 male -1 -1 -1
2 Afghanistan 1996 female -1 -1 -1
3 Afghanistan 1996 male -1 -1 -1
4 Afghanistan 1997 female 5 96 1
5 Afghanistan 1997 male 0 26 0
6 Afghanistan 1998 female 45 1142 20
我能够逐行对 3 列求和并使用以下内容创建一个单独的列,但仍需要合并每个国家/地区的男性和女性行:
tuberculosiscases <-tuberculosis$child + tuberculosis$adult + tuberculosis$elderly
names(tuberculosiscases) <- c("tuberculosiscases")
tuberculosis <- data.frame(tuberculosis,tuberculosiscases)
head(tuberculosis)
country year sex child adult elderly tuberculosiscases
1 Afghanistan 1995 male -1 -1 -1 -3
2 Afghanistan 1996 female -1 -1 -1 -3
3 Afghanistan 1996 male -1 -1 -1 -3
4 Afghanistan 1997 female 5 96 1 102
5 Afghanistan 1997 male 0 26 0 26
6 Afghanistan 1998 female 45 1142 20 1207
如果要将总和添加到数据框,有几个选项:
# with base R (1)
transform(dat, tuber.sum = ave(tuberculosiscases, country, year, FUN = sum))
# with base R (2)
dat$tuber.sum <- ave(dat$tuberculosiscases, dat$country, dat$year, FUN = sum))
# with the data.table package
library(data.table)
setDT(dat)[, tuber.sum:=sum(tuberculosiscases), by= .(country, year)]
# with the plyr package
library(plyr)
dat <- ddply(dat, .(country, year), transform, tuber.sum=sum(tuberculosiscases))
# with the dplyr package
library(dplyr)
dat <- dat %>%
group_by(country, year) %>%
mutate(tuber.sum=sum(tuberculosiscases))
全部给:
> dat
country year sex child adult elderly tuberculosiscases tuber.sum
1: Afghanistan 1995 male -1 -1 -1 -3 -3
2: Afghanistan 1996 female -1 -1 -1 -3 -6
3: Afghanistan 1996 male -1 -1 -1 -3 -6
4: Afghanistan 1997 female 5 96 1 102 128
5: Afghanistan 1997 male 0 26 0 26 128
6: Afghanistan 1998 female 45 1142 20 1207 1207
如果我正确理解你的问题并假设初始 data.frame 的名称是 my_df 我会使用聚合:
aggdata <-aggregate(my_df[,c("child", "adult", "elderly")],
by=list(my_df$country,my_df$year), FUN=sum, na.rm=TRUE)
我正在尝试按年份对每个国家/地区的第 4 列(儿童)、第 5 列(成人)和第 6 列(老年人)和 return 值求和,忽略第 3 列(性别)。通过各种论坛阅读我无法结合这些:
country year sex child adult elderly
1 Afghanistan 1995 male -1 -1 -1
2 Afghanistan 1996 female -1 -1 -1
3 Afghanistan 1996 male -1 -1 -1
4 Afghanistan 1997 female 5 96 1
5 Afghanistan 1997 male 0 26 0
6 Afghanistan 1998 female 45 1142 20
我能够逐行对 3 列求和并使用以下内容创建一个单独的列,但仍需要合并每个国家/地区的男性和女性行:
tuberculosiscases <-tuberculosis$child + tuberculosis$adult + tuberculosis$elderly
names(tuberculosiscases) <- c("tuberculosiscases")
tuberculosis <- data.frame(tuberculosis,tuberculosiscases)
head(tuberculosis)
country year sex child adult elderly tuberculosiscases
1 Afghanistan 1995 male -1 -1 -1 -3
2 Afghanistan 1996 female -1 -1 -1 -3
3 Afghanistan 1996 male -1 -1 -1 -3
4 Afghanistan 1997 female 5 96 1 102
5 Afghanistan 1997 male 0 26 0 26
6 Afghanistan 1998 female 45 1142 20 1207
如果要将总和添加到数据框,有几个选项:
# with base R (1)
transform(dat, tuber.sum = ave(tuberculosiscases, country, year, FUN = sum))
# with base R (2)
dat$tuber.sum <- ave(dat$tuberculosiscases, dat$country, dat$year, FUN = sum))
# with the data.table package
library(data.table)
setDT(dat)[, tuber.sum:=sum(tuberculosiscases), by= .(country, year)]
# with the plyr package
library(plyr)
dat <- ddply(dat, .(country, year), transform, tuber.sum=sum(tuberculosiscases))
# with the dplyr package
library(dplyr)
dat <- dat %>%
group_by(country, year) %>%
mutate(tuber.sum=sum(tuberculosiscases))
全部给:
> dat
country year sex child adult elderly tuberculosiscases tuber.sum
1: Afghanistan 1995 male -1 -1 -1 -3 -3
2: Afghanistan 1996 female -1 -1 -1 -3 -6
3: Afghanistan 1996 male -1 -1 -1 -3 -6
4: Afghanistan 1997 female 5 96 1 102 128
5: Afghanistan 1997 male 0 26 0 26 128
6: Afghanistan 1998 female 45 1142 20 1207 1207
如果我正确理解你的问题并假设初始 data.frame 的名称是 my_df 我会使用聚合:
aggdata <-aggregate(my_df[,c("child", "adult", "elderly")],
by=list(my_df$country,my_df$year), FUN=sum, na.rm=TRUE)