合并列表中的数据框
Merging data frames in a list
这是我之前问题 that built a discussion around simplifying my function and eliminating the need for merging data frames that result from an lapply
. Although tools such as dplyr
and data.table
eliminate the need for the merging, I'd still like to know how to merge in this situation. I have simplified the function that produces the list based on this 的一个分支。
#Reproducible data
Data <- data.frame("custID" = c(1:10, 1:20),
"v1" = rep(c("A", "B"), c(10,20)),
"v2" = c(30:21, 20:19, 1:3, 20:6), stringsAsFactors = TRUE)
#Split-Apply function
res <- lapply(split(Data, Data$v1), function(df) {
cutoff <- quantile(df$v2, c(0.8, 0.9))
top_pct <- ifelse(df$v2 > cutoff[2], 10, ifelse(df$v2 > cutoff[1], 20, NA))
na.omit(data.frame(custID = df$custID, top_pct))
})
这给了我以下结果:
$A
custID top_pct
1 1 10
2 2 20
$B
custID top_pct
1 1 10
2 2 20
6 6 10
7 7 20
我希望结果如下所示:
custID A_top_pct B_top_pct
1 1 10 10
2 2 20 20
3 6 NA 10
4 7 NA 20
到达那里的最佳方式是什么?我应该做某种重塑吗?如果这样做,是否必须先合并数据框?
这是我的解决方案,可能不是最好的。 (在实际应用中,列表中的数据框会多于两个。)
#Change the new variable name
names1 <- names(res)
for(i in 1:length(res)) {
names(res[[i]])[2] <- paste0(names1[i], "_top_pct")
}
#Merge the results
res_m <- res[[1]]
for(i in 2:length(res)) {
res_m <- merge(res_m, res[[i]], by = "custID", all = TRUE)
}
您可以尝试 Reduce
和 merge
Reduce(function(...) merge(..., by='custID', all=TRUE), res)
# custID top_pct.x top_pct.y
#1 1 10 10
#2 2 20 20
#3 6 NA 10
#4 7 NA 20
或者正如@Colonel Beauvel 所建议的那样,一种更具可读性的方法是用 library(functional)
中的 Curry
包装它
library(functional)
Reduce(Curry(merge, by='custID', all=T), res)
这是我之前问题 lapply
. Although tools such as dplyr
and data.table
eliminate the need for the merging, I'd still like to know how to merge in this situation. I have simplified the function that produces the list based on this
#Reproducible data
Data <- data.frame("custID" = c(1:10, 1:20),
"v1" = rep(c("A", "B"), c(10,20)),
"v2" = c(30:21, 20:19, 1:3, 20:6), stringsAsFactors = TRUE)
#Split-Apply function
res <- lapply(split(Data, Data$v1), function(df) {
cutoff <- quantile(df$v2, c(0.8, 0.9))
top_pct <- ifelse(df$v2 > cutoff[2], 10, ifelse(df$v2 > cutoff[1], 20, NA))
na.omit(data.frame(custID = df$custID, top_pct))
})
这给了我以下结果:
$A
custID top_pct
1 1 10
2 2 20
$B
custID top_pct
1 1 10
2 2 20
6 6 10
7 7 20
我希望结果如下所示:
custID A_top_pct B_top_pct
1 1 10 10
2 2 20 20
3 6 NA 10
4 7 NA 20
到达那里的最佳方式是什么?我应该做某种重塑吗?如果这样做,是否必须先合并数据框?
这是我的解决方案,可能不是最好的。 (在实际应用中,列表中的数据框会多于两个。)
#Change the new variable name
names1 <- names(res)
for(i in 1:length(res)) {
names(res[[i]])[2] <- paste0(names1[i], "_top_pct")
}
#Merge the results
res_m <- res[[1]]
for(i in 2:length(res)) {
res_m <- merge(res_m, res[[i]], by = "custID", all = TRUE)
}
您可以尝试 Reduce
和 merge
Reduce(function(...) merge(..., by='custID', all=TRUE), res)
# custID top_pct.x top_pct.y
#1 1 10 10
#2 2 20 20
#3 6 NA 10
#4 7 NA 20
或者正如@Colonel Beauvel 所建议的那样,一种更具可读性的方法是用 library(functional)
Curry
包装它
library(functional)
Reduce(Curry(merge, by='custID', all=T), res)