对多个值进行分组和划分

Question

我想按几个变量对我的数据集进行分组，然后对数值变量求和。然后将各个值除以这个总和得到一个比例，并将其变异为一列。

例如，假设我有这样一个数据集：

year         disastertype area(km^2)     country
2001           earthquake   1907.098 Afghanistan
2001           earthquake   3635.378 Afghanistan
2001           earthquake   5889.177 Afghanistan
2001 extreme temperature    8042.396 Afghanistan
2001 extreme temperature   11263.485 Afghanistan
2001 extreme temperature   11802.311 Afghanistan

我可以使用

获得与灾害类型和国家相关的面积总和

test_two <- test_one %>%group_by(disastertype, country,`area(km^2)`, year) %>% count %>% aggregate(. ~ disastertype + country + year,data=., sum)

但是当我尝试用这个总和除以面积时：

data_test$`area(km^2)` %>%  map_dbl(~ .x/data_test2$`area(km^2)`)

Error: Result 1 must be a single double, not a double vector of length 2

预期结果：

    year         disastertype area(km^2)     country  proportion
1   2001           earthquake   1907.098 Afghanistan  0.1668261   
10  2001           earthquake   3635.378 Afghanistan  0.3180099
65  2001           earthquake   5889.177 Afghanistan  0.5151642
109 2001 extreme temperature    8042.396 Afghanistan  0.2585299
135 2001 extreme temperature   11263.485 Afghanistan  0.3620746
146 2001 extreme temperature   11802.311 Afghanistan  0.3793956

可重现代码：

structure(list(year = c(2001, 2001, 2001, 2001, 2001, 2001), 
    disastertype = c("earthquake", "earthquake", "earthquake", 
    "extreme temperature ", "extreme temperature ", "extreme temperature "
    ), `area(km^2)` = c(1907.09808242381, 3635.37825411105, 5889.17746880181, 
    8042.39623016696, 11263.4848508564, 11802.3111500339), country = c("Afghanistan", 
    "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
    "Afghanistan")), row.names = c(1L, 10L, 65L, 109L, 135L, 
146L), class = "data.frame")

Answer 1

你不应该按 area(km^2):

分组

df %>%
  group_by(year, country, disastertype) %>%
  mutate(proportion = `area(km^2)` / sum(`area(km^2)`)) %>%
  ungroup()

对多个值进行分组和划分

Group and divide multiple values

r

purrr