分组并总结

Group by and summarise

我想根据 3 个变量进行分组,并使用汇总函数创建新变量。

我的代码:

选项 1

library(tidyverse)
library(dplyr)

example2<-example%>%
  group_by(age_cohort,sex,city)%>%
  summarise(rich=sum(rich),
            middleclass=sum(middleclass),
            poor=sum(poor),
            population=count(id))

我不明白错误:

Error in `summarise()`:
! Problem while computing `population = count(id)`.
i The error occurred in group 1: age_cohort = 1, sex = 0, city = 1.
Caused by error in `UseMethod()`:
! no applicable method for 'count' applied to an object of class "c('double', 'numeric')"
Run `rlang::last_error()` to see where the error occurred.

选项 2

example3<-example%>%
  group_by(age_cohort,sex,city)%>%
  summarise(rich=sum(rich),
            middleclass=sum(middleclass),
            poor=sum(poor),
            population=n(id))

错误:

Error in `summarise()`:
! Problem while computing `population = n(id)`.
i The error occurred in group 1: age_cohort = 1, sex = 0, city = 1.
Caused by error in `n()`:
! unused argument (id)
Run `rlang::last_error()` to see where the error occurred.

此外,如果我删除 'population' 变量,我的代码仍然有问题。

新代码

example<-example%>%
  group_by(age_cohort,sex,city)%>%
  summarise(rich=sum(rich),
            middleclass=sum(middleclass),
            poor=sum(poor))

错误:

Error in UseMethod("group_by") : 
  no applicable method for 'group_by' applied to an object of class "function"

原始数据(例子):

id  sex city    rich    middleclass poor    age_cohort
1   0   1   1   0   0   1
2   1   1   0   1   0   5
3   1   2   0   0   1   2
4   0   2   0   0   1   3
5   1   3   0   0   1   4
6   0   4   0   1   0   1
7   0   6   0   1   0   1
8   1   7   1   0   0   5
9   0   3   1   0   0   5
10  1   7   0   1       5
11  1   3   0   0   1   2
12  1   1   0   0   1   3

正如阿克伦所说,你需要 population=n().

id <- 1:12
sex <- c(0,1,1,0,1,0,0,1,0,1,1,1)
city <- c(1,1,2,2,3,4,6,7,3,7,3,1)
rich <- c(1,0,0,0,0,0,0,1,1,0,0,0)
middleclass <- c(0,1,0,0,0,1,1,0,0,1,0,0)
poor <- c(0,0,1,1,1,0,0,0,0,NA,1,1)
age_cohort <- c(1,5,2,3,4,1,1,5,5,5,2,3)
example <- data.frame(id,sex,city,rich,middleclass,poor,age_cohort)

example3 <- example%>%
  group_by(age_cohort,sex,city)%>%
  summarise(rich=sum(rich),
            middleclass=sum(middleclass),
            poor=sum(poor),
            population=n())

输出

> example
   id sex city rich middleclass poor age_cohort
1   1   0    1    1           0    0          1
2   2   1    1    0           1    0          5
3   3   1    2    0           0    1          2
4   4   0    2    0           0    1          3
5   5   1    3    0           0    1          4
6   6   0    4    0           1    0          1
7   7   0    6    0           1    0          1
8   8   1    7    1           0    0          5
9   9   0    3    1           0    0          5
10 10   1    7    0           1   NA          5
11 11   1    3    0           0    1          2
12 12   1    1    0           0    1          3
> example3
# A tibble: 11 x 7
# Groups:   age_cohort, sex [7]
   age_cohort   sex  city  rich middleclass  poor population
        <dbl> <dbl> <dbl> <dbl>       <dbl> <dbl>      <int>
 1          1     0     1     1           0     0          1
 2          1     0     4     0           1     0          1
 3          1     0     6     0           1     0          1
 4          2     1     2     0           0     1          1
 5          2     1     3     0           0     1          1
 6          3     0     2     0           0     1          1
 7          3     1     1     0           0     1          1
 8          4     1     3     0           0     1          1
 9          5     0     3     1           0     0          1
10          5     1     1     0           1     0          1
11          5     1     7     1           1    NA          2

为什么会出错

正如其他人在评论中指出的那样。

第一个错误是由于 count 处理数据帧和变量名;它不能用作汇总函数。例如,count(example, sex)。您给 count 一个数值向量 (an object of class "c('double', 'numeric')),它不能作为参数 (no applicable method for 'count' applied to...)。

第二个错误是由于 n() 仅返回有关最后一个分组变量的信息(参见 ?context)。这一次,你给了它一个参数,但它没有接受任何参数,因为最后一个分组变量是由 group_by 指定的,所以它返回 unused argument.

最后一个错误是由于您在执行 group_by 之前没有在环境中创建对象 example。实际上,exampleutils 中的函数名称(参见 ?example)。因此,如果您不使用该名称创建对象,R 认为您指的是名为 example 的函数。然后你尝试 group 它,R 不能,因为它只适用于数据帧。当它需要一个数据帧时,你给了它一个 class 函数 (an object of class "function") 的参数。