合并列表中的数据框

Question

这是我之前问题 that built a discussion around simplifying my function and eliminating the need for merging data frames that result from an lapply. Although tools such as dplyr and data.table eliminate the need for the merging, I'd still like to know how to merge in this situation. I have simplified the function that produces the list based on this 的一个分支。

#Reproducible data
Data <- data.frame("custID" = c(1:10, 1:20),
    "v1" = rep(c("A", "B"), c(10,20)), 
    "v2" = c(30:21, 20:19, 1:3, 20:6), stringsAsFactors = TRUE)

#Split-Apply function
res <- lapply(split(Data, Data$v1), function(df) {
    cutoff <- quantile(df$v2, c(0.8, 0.9))
    top_pct <- ifelse(df$v2 > cutoff[2], 10, ifelse(df$v2 > cutoff[1], 20, NA))
    na.omit(data.frame(custID = df$custID, top_pct))
    })

这给了我以下结果：

$A
  custID top_pct
1      1      10
2      2      20

$B
  custID top_pct
1      1      10
2      2      20
6      6      10
7      7      20

我希望结果如下所示：

  custID A_top_pct B_top_pct
1      1        10        10
2      2        20        20
3      6        NA        10
4      7        NA        20

到达那里的最佳方式是什么？我应该做某种重塑吗？如果这样做，是否必须先合并数据框？

这是我的解决方案，可能不是最好的。（在实际应用中，列表中的数据框会多于两个。）

#Change the new variable name
names1 <- names(res)

for(i in 1:length(res)) {
    names(res[[i]])[2] <- paste0(names1[i], "_top_pct")
}

#Merge the results
res_m <- res[[1]]
for(i in 2:length(res)) {
    res_m <- merge(res_m, res[[i]], by = "custID", all = TRUE)
}

Answer 1

您可以尝试 Reduce 和 merge

 Reduce(function(...) merge(..., by='custID', all=TRUE), res)
 #     custID top_pct.x top_pct.y
 #1      1        10        10
 #2      2        20        20
 #3      6        NA        10
 #4      7        NA        20

或者正如@Colonel Beauvel 所建议的那样，一种更具可读性的方法是用 library(functional)

中的 Curry 包装它

 library(functional)
 Reduce(Curry(merge, by='custID', all=T), res)

合并列表中的数据框

Merging data frames in a list

merge

r