对多个值进行分组和划分
Group and divide multiple values
我想按几个变量对我的数据集进行分组,然后对数值变量求和。然后将各个值除以这个总和得到一个比例,并将其变异为一列。
例如,假设我有这样一个数据集:
year disastertype area(km^2) country
2001 earthquake 1907.098 Afghanistan
2001 earthquake 3635.378 Afghanistan
2001 earthquake 5889.177 Afghanistan
2001 extreme temperature 8042.396 Afghanistan
2001 extreme temperature 11263.485 Afghanistan
2001 extreme temperature 11802.311 Afghanistan
我可以使用
获得与灾害类型和国家相关的面积总和
test_two <- test_one %>%group_by(disastertype, country,`area(km^2)`, year) %>% count %>% aggregate(. ~ disastertype + country + year,data=., sum)
但是当我尝试用这个总和除以面积时:
data_test$`area(km^2)` %>% map_dbl(~ .x/data_test2$`area(km^2)`)
Error: Result 1 must be a single double, not a double vector of length 2
预期结果:
year disastertype area(km^2) country proportion
1 2001 earthquake 1907.098 Afghanistan 0.1668261
10 2001 earthquake 3635.378 Afghanistan 0.3180099
65 2001 earthquake 5889.177 Afghanistan 0.5151642
109 2001 extreme temperature 8042.396 Afghanistan 0.2585299
135 2001 extreme temperature 11263.485 Afghanistan 0.3620746
146 2001 extreme temperature 11802.311 Afghanistan 0.3793956
可重现代码:
structure(list(year = c(2001, 2001, 2001, 2001, 2001, 2001),
disastertype = c("earthquake", "earthquake", "earthquake",
"extreme temperature ", "extreme temperature ", "extreme temperature "
), `area(km^2)` = c(1907.09808242381, 3635.37825411105, 5889.17746880181,
8042.39623016696, 11263.4848508564, 11802.3111500339), country = c("Afghanistan",
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",
"Afghanistan")), row.names = c(1L, 10L, 65L, 109L, 135L,
146L), class = "data.frame")
你不应该按 area(km^2)
:
分组
df %>%
group_by(year, country, disastertype) %>%
mutate(proportion = `area(km^2)` / sum(`area(km^2)`)) %>%
ungroup()
我想按几个变量对我的数据集进行分组,然后对数值变量求和。然后将各个值除以这个总和得到一个比例,并将其变异为一列。
例如,假设我有这样一个数据集:
year disastertype area(km^2) country
2001 earthquake 1907.098 Afghanistan
2001 earthquake 3635.378 Afghanistan
2001 earthquake 5889.177 Afghanistan
2001 extreme temperature 8042.396 Afghanistan
2001 extreme temperature 11263.485 Afghanistan
2001 extreme temperature 11802.311 Afghanistan
我可以使用
获得与灾害类型和国家相关的面积总和test_two <- test_one %>%group_by(disastertype, country,`area(km^2)`, year) %>% count %>% aggregate(. ~ disastertype + country + year,data=., sum)
但是当我尝试用这个总和除以面积时:
data_test$`area(km^2)` %>% map_dbl(~ .x/data_test2$`area(km^2)`)
Error: Result 1 must be a single double, not a double vector of length 2
预期结果:
year disastertype area(km^2) country proportion
1 2001 earthquake 1907.098 Afghanistan 0.1668261
10 2001 earthquake 3635.378 Afghanistan 0.3180099
65 2001 earthquake 5889.177 Afghanistan 0.5151642
109 2001 extreme temperature 8042.396 Afghanistan 0.2585299
135 2001 extreme temperature 11263.485 Afghanistan 0.3620746
146 2001 extreme temperature 11802.311 Afghanistan 0.3793956
可重现代码:
structure(list(year = c(2001, 2001, 2001, 2001, 2001, 2001),
disastertype = c("earthquake", "earthquake", "earthquake",
"extreme temperature ", "extreme temperature ", "extreme temperature "
), `area(km^2)` = c(1907.09808242381, 3635.37825411105, 5889.17746880181,
8042.39623016696, 11263.4848508564, 11802.3111500339), country = c("Afghanistan",
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",
"Afghanistan")), row.names = c(1L, 10L, 65L, 109L, 135L,
146L), class = "data.frame")
你不应该按 area(km^2)
:
df %>%
group_by(year, country, disastertype) %>%
mutate(proportion = `area(km^2)` / sum(`area(km^2)`)) %>%
ungroup()