读入 CSV 文件并添加带有文件名的列

Question

假设您有如下 2 个文件。

file_1_october.csv
file_2_november.csv

这些文件具有相同的列。所以我想读取 R 中的两个文件，我可以使用地图轻松完成。我还想在每个读取文件中包含一列 month 和文件名。例如，对于 file_1_october.csv，我想要一个名为“month”的列，其中包含单词“file_1_october.csv”。

为了可重复性，假设 file_1_october.csv 是

name,age,gender
james,24,male
Sue,21,female

而 file_2_november.csv 是

name,age,gender
Grey,24,male
Juliet,21,female

我想阅读这两个文件，但在每个文件中都包含一个与文件名对应的月份列，这样我们就有了；

name,age,gender,month
james,24,male, file_1_october.csv
Sue,21,female, file_1_october.csv

和

name,age,gender,month,
Grey,24,male, file_2_november.csv,
Juliet,21,female, file_2_november.csv

Answer 1

也许是这样的？

csvlist <- c("file_1_october.csv", "file_2_november.csv")

df_list <- lapply(csvlist, function(x) read.csv(x) %>% mutate(month = x))

for (i in seq_along(df_list)) {
  assign(paste0("df", i), df_list[[i]])
}

这两个数据帧将保存在df1和df2中。

Answer 2

这是一个（大部分）tidyverse 避免循环的替代方法：

library(tidyverse)

csv_names <- list.files(path = "path/", # set the path to your folder with csv files
                        pattern = "*.csv", # select all csv files in the folder
                        full.names = T) # output full file names (with path)
# csv_names <- c("file_1_october.csv", "file_2_november.csv")

csv_names2 <- data.frame(month = csv_names, 
                         id = as.character(1:length(csv_names))) # id for joining

data <- csv_names %>% 
  lapply(read_csv) %>% # read all the files at once
  bind_rows(.id = "id") %>% # bind all tables into one object, and give id for each
  left_join(csv_names2) # join month column created earlier

这给出了一个数据对象，其中包含来自所有 CSV 的数据。如果您单独需要它们，您可以省略 bind_rows() 步骤，为您提供多个表（“tibbles”）的列表。然后可以使用 list2env() 或某些 split() 函数拆分它们。

读入 CSV 文件并添加带有文件名的列

Read in CSV files and Add a Column with File name

csv

r

dplyr

tidyverse