有没有一种方法可以自动按 data.table 中的（几乎）所有列进行分组

Question

基本上我有一个包含大量列的数据集，它甚至可能在未来增长。

现在，在我分析数据之前，在大多数情况下，按所有列分组是有意义的。我知道我可以手动输入所有内容，但我想知道是否有办法让它自动输入。

例如，考虑发票项目列表，其中许多属性实际上只是进一步描述产品（数据被严重非规范化），例如：

InvoiceId     ProductId    Price   CustomerName   SomeOtherProductAttribute...
123           ABC          32.11   CustA          xyz
123           BBB          99.99   CustA          xyzy
444           ABC          32.11   CustB          xyz
444           CCC          12.99   CustB          ttt

我要总结价格

[,sum(price),by=list(invoiceId,ProductId,CustomerName,SomeOtherProductAttribute)]

Answer 1

使用 plyr 包中的 ddply

library(plyr)
var_group<-colnames(data)[!(colnames(data) %in% "price")]
ddply(data,(var_group),summarise,price_sum=price)

Answer 2

你可以使用 setdiff:

DT[, sum(Price), by = setdiff(names(DT), "Price")]
   InvoiceId ProductId CustomerName SomeOtherProductAttribute...    V1
1:       123       ABC        CustA                          xyz 32.11
2:       123       BBB        CustA                         xyzy 99.99
3:       444       ABC        CustB                          xyz 32.11
4:       444       CCC        CustB                          ttt 12.99

有没有一种方法可以自动按 data.table 中的（几乎）所有列进行分组

Is there a way to automagically group by (almost) all columns in data.table

r

data.table