合并具有相同变量和观察值的多个数据框

Merge several data frames having the same variables and observations

我每年都有几个 CSV 文件。每个文件都包含相同的变量和观察结果。

df14 <- data.frame(name = c("one", "two", "three"), A = c(1,2,3), B = c(4, 2, 1), C = c(0, 1, 1))
df15 <- data.frame(name = c("one", "two", "three"), A = c(3,1,1), C = c(0, 0, 1), B = c(8, 5, 5))

假设df14df15分别代表2014年和2015年。

注意:变量不按顺序记录

我想做的是查看每个 name.

的每个变量(A、B、C)如何按年变化

有没有办法将它们合并到一个数据框中?我应该简单地 rbind 他们吗?

更新:

我可以做的一件事是将年份分配为一个新变量,然后 rbind 但这是一个好习惯吗?

df14$year <- 2014; df15$year <- 2015
df <- rbind(df14, df15)

给出:

   name A B C year
   one 1 4 0 2014
   two 2 2 1 2014
   three 3 1 1 2014
   one 3 8 0 2015
   two 1 5 0 2015
   three 1 5 1 2015

尝试:

library(data.table)
library(magrittr)
years_2_digt <- 14:15

DT <- 
rbindlist(lapply(years_2_digt, function(y) {
  get(paste0("df", y)) %>% 
  setDT %>% 
  .[, year := y] %>%
  setkeyv("name")
}))


DT.molt <- reshape2::melt(DT, id.vars=c("name", "year"))

library(ggplot2)
ggplot(data=DT.molt, aes(x=year, color=variable, y=value)) + 
    geom_line() + geom_point() + 
    facet_grid(name ~ .) + 
    ggtitle("Change by year and name")

您可以通过编程方式将年份列添加到每个数据框,然后重新绑定它们。这是一个依赖于能够从文件名中获取与每个数据框对应的年份的示例。在这里,我将示例数据框存储在列表中。在您的实际用例中,您将使用 df.list = sapply(vector_of_file_names, read.csv).

之类的内容将 csv 文件读入列表
df.list = list(df14=df14, df15=df15)

df.list = lapply(1:length(df.list), function(i) {
  df.list[[i]] = data.frame(df.list[[i]], 
                            year = 2000 + as.numeric(gsub(".*(\d{2})\.csv","\1", names(df.list)[[i]])))
})

df = do.call(rbind, df.list)

这是一个工作示例 lapply:

制作一些虚拟 CSV 文件:

df14 <- data.frame(name = c("one", "two", "three"), A = c(1,2,3), B = c(4, 2, 1), C = c(0, 1, 1))
df15 <- data.frame(name = c("one", "two", "three"), A = c(3,1,1), C = c(0, 0, 1), B = c(8, 5, 5))
df16 <- data.frame(name = c("one", "two", "three"), C = c(1,2,3), B = c(4, 2, 1), A = c(0, 1, 1))
df17 <- data.frame(name = c("one", "two", "three"), C = c(3,1,1), A = c(0, 0, 1), B = c(8, 5, 5))
#get dataframe names
myNames <- ls()[grepl("df",ls())]
lapply(myNames, function(i){write.csv(get(i),paste0(i,".csv"),row.names = FALSE)})

解决方案:读取 CSV 文件,使用排序修复列,然后 rbind 将它们合并到一个数据框中:

#Solution - read CSV, fix columns, rbind
do.call(rbind,
        lapply(list.files(".","^df\d*.csv"),
               function(i){
                 d <- read.csv(i)
                 res <- d[,sort(colnames(d))]
                 cbind(res,FileName=i)
               }))
# output
#    A B C  name FileName
# 1  1 4 0   one df14.csv
# 2  2 2 1   two df14.csv
# 3  3 1 1 three df14.csv
# 4  3 8 0   one df15.csv
# 5  1 5 0   two df15.csv
# 6  1 5 1 three df15.csv
# 7  0 4 1   one df16.csv
# 8  1 2 2   two df16.csv
# 9  1 1 3 three df16.csv
# 10 0 8 3   one df17.csv
# 11 0 5 1   two df17.csv
# 12 1 5 1 three df17.csv