按两列聚合 R

Question

我有一个包含三列的数据集

  Date1      StudentId  Status
  08/04/2014 155261     Yes
  08/04/2014 155261     No
  08/25/2014 236991     Yes
  08/27/2014 236991     Yes
  08/29/2014 236991     Yes

我正在尝试通过 Id 和 Date1 聚合状态，这样最终的数据集就会像这样

  Date1      StudentId  Response
  08/04/2014 155261     Yes, No
  08/25/2014 236991     Yes
  08/27/2014 236991     Yes
  08/29/2014 236991     Yes

我尝试使用 gsub 函数但它没有用，它仅基于 StudentId 进行聚合并跳过 Date，非常感谢任何有关此问题的帮助。

 dataset1[,Response:=gsub("(, )+$","",c(paste(Status,collapse=", "),rep("",.N-1))),by=c("StudentId ","Date1")]

Answer 1

df <- data.frame(Date1=c('08/04/2014','08/04/2014','08/25/2014','08/27/2014','08/29/2014'), StudentId=c(155261,155261,236991,236991,236991), Status=c('Yes','No','Yes','Yes','Yes') );
aggregate(Status~Date1+StudentId,df,paste,collapse=', ');
##        Date1 StudentId  Status
## 1 08/04/2014    155261 Yes, No
## 2 08/25/2014    236991     Yes
## 3 08/27/2014    236991     Yes
## 4 08/29/2014    236991     Yes

您可以分别将列从 Status 重命名为 Response：

names(df)[names(df)=='Status'] <- 'Response';

Answer 2

连接字符串不需要 gsub（感谢@DavidArenburg 的简化）：

DT1 <- DT[,list(Response=toString(Status)),by=list(Date1,StudentId)]

如果学生可能会以相同的状态出现多次，您需要在 Status 上使用 unique。

您可以使用 list(String) 将值存储在列表中，而不是使用字符串。

好的: 可以使用%in%这样的set操作，可能比字符串解析更直观
不好的地方：列表列不能用在by操作中，一般比较麻烦

按两列聚合 R

aggregate by two columns R

aggregate

r