如何根据关键变量合并具有部分相同列名的多个数据框?
How to merge multiple data frames that have partly the same column names based on a key variable?
我有多个数据框(大约 20 个,这里只显示 3 个简化的),其中有一些重叠的列名。关键变量是 "id" ,我想根据这个关键变量合并数据框。不应创建额外的列。另一方面,我想避免重复的行,这样具有相同键的行被合并以填充尽可能多的空字段,同时每个键只有一行。
我已经尝试过 "rbind.fill" 但是虽然它正确地填充了列,但它创建了重复的行。另一方面,如果我尝试 "merge" 它会返回一个空数据框。
df1<- cbind.data.frame(id=c(1,2,3,4), price= c(15,16,20,25), color= c("Black", NA, "White", "Green"), weight= c(5,6,10,12))
df2<- cbind.data.frame(id=c(3,4,5,6), price=c(NA, NA, 23,30), weight=c(10,12,NA, NA), battery= c("low", "high", NA, NA))
df3<- cbind.data.frame(id=c(5,6,7,8), weight= c(NA, 15,17,NA), battery= c("low", "high","high", NA), surface= c(100,115,NA, NA))
df_list <- list(df1,df2,df3)
df5<-Reduce(function(d1, d2) merge(d1, d2, by = "id"),df_list)
library(plyr)
df6 <- rbind.fill(df1,df2,df3)
我希望输出像这样的数据框:
df4 <- cbind.data.frame(id=c(1,2,3,4,5,6,7,8), price= c(15,16,20,25,23,30,17,NA),color= c("Black", NA, "White", "Green", NA, NA, NA, NA),weight= c(5,6,10,12,NA, 15,NA,NA), battery= c(NA, NA,"low", "high","low", "high","high", NA), surface= c(NA, NA, NA, NA,100,115,NA, NA))
与dplyr
:
df_list <- list(df1,df2,df3)
library(dplyr)
bind_rows(df_list) %>%
group_by(id) %>%
summarise_all(~first(na.omit(.)))
我假设非缺失字段将在数据帧中匹配,因此只选择第一个观察到的字段。
结果
# A tibble: 8 x 6
id price color weight battery surface
<dbl> <dbl> <fct> <dbl> <fct> <dbl>
1 1 15 Black 5 NA NA
2 2 16 NA 6 NA NA
3 3 20 White 10 low NA
4 4 25 Green 12 high NA
5 5 23 NA NA low 100
6 6 30 NA 15 high 115
7 7 NA NA 17 high NA
8 8 NA NA NA NA NA
我有多个数据框(大约 20 个,这里只显示 3 个简化的),其中有一些重叠的列名。关键变量是 "id" ,我想根据这个关键变量合并数据框。不应创建额外的列。另一方面,我想避免重复的行,这样具有相同键的行被合并以填充尽可能多的空字段,同时每个键只有一行。
我已经尝试过 "rbind.fill" 但是虽然它正确地填充了列,但它创建了重复的行。另一方面,如果我尝试 "merge" 它会返回一个空数据框。
df1<- cbind.data.frame(id=c(1,2,3,4), price= c(15,16,20,25), color= c("Black", NA, "White", "Green"), weight= c(5,6,10,12))
df2<- cbind.data.frame(id=c(3,4,5,6), price=c(NA, NA, 23,30), weight=c(10,12,NA, NA), battery= c("low", "high", NA, NA))
df3<- cbind.data.frame(id=c(5,6,7,8), weight= c(NA, 15,17,NA), battery= c("low", "high","high", NA), surface= c(100,115,NA, NA))
df_list <- list(df1,df2,df3)
df5<-Reduce(function(d1, d2) merge(d1, d2, by = "id"),df_list)
library(plyr)
df6 <- rbind.fill(df1,df2,df3)
我希望输出像这样的数据框:
df4 <- cbind.data.frame(id=c(1,2,3,4,5,6,7,8), price= c(15,16,20,25,23,30,17,NA),color= c("Black", NA, "White", "Green", NA, NA, NA, NA),weight= c(5,6,10,12,NA, 15,NA,NA), battery= c(NA, NA,"low", "high","low", "high","high", NA), surface= c(NA, NA, NA, NA,100,115,NA, NA))
与dplyr
:
df_list <- list(df1,df2,df3)
library(dplyr)
bind_rows(df_list) %>%
group_by(id) %>%
summarise_all(~first(na.omit(.)))
我假设非缺失字段将在数据帧中匹配,因此只选择第一个观察到的字段。
结果
# A tibble: 8 x 6
id price color weight battery surface
<dbl> <dbl> <fct> <dbl> <fct> <dbl>
1 1 15 Black 5 NA NA
2 2 16 NA 6 NA NA
3 3 20 White 10 low NA
4 4 25 Green 12 high NA
5 5 23 NA NA low 100
6 6 30 NA 15 high 115
7 7 NA NA 17 high NA
8 8 NA NA NA NA NA