使用 R 读取多个目录中的特定 csv 文件

Question

我需要使用 R 读取存储在多个目录中的特定 csv 文件。每个目录都包含这些文件（和其他文件），但这些文件以不同的名称列出，但具有不同的字符，使它们易于识别。

假设我要读取的 csv 文件包含以下不同字符：“1”（文件 1）和“2”（文件 2）。

这是我目前试过的代码：

# This is the main directory where all your the sub-dir with files are stored
common_path = "~/my/main/directory"

# Extract the names of the sub-dir
primary_dirs = list.files(common_path) 

# Create empty list of lists
data_lst = rep(list(list()), length(primary_dirs)) # one list per each directory

# These are the 2 files (by code) that I need to read
names_csv = c('1', '2')

#### Nested for loop reading the csv files into the list of lists
for (i in 1:length(primary_dirs)) {

    for (j in 1:length(names_csv)) {

    data_lst[[i]][j] = read.csv(paste('~/my/main/directory/', primary_dirs[i],  
                                      '/name_file', names_csv[j],  '.csv', sep = ''))

    }
}
### End of nested loop

这里的问题是代码仅在每个目录中的文件名相同时才有效。但这种情况并非如此。每个目录都有不同的文件名，但文件名包含不同的字符“1”和“2”。

例如在这种情况下，我在所有目录中的文件都称为 'name_file1.csv' 和 'name_file2.csv'。但在我的真实情况下，文件名类似于：dir 1 -> 'name_bla_1.csv', 'name_bla_2.csv';目录 2 -> 'name_gya_1.csv' 'name_gya_2.csv'；等...

如何从我的所有目录中读取这两个文件名不同的文件？

谢谢

Answer 1

你把事情搞得太复杂了。 list.files 可以 递归地搜索 （在目录内），可以 return 完整的文件路径所以你不必担心 paste 一起在哪里文件路径，可以匹配正则表达式 patterns.

files_to_read = list.files(
  path = common_path,        # directory to search within
  pattern = ".*(1|2).*csv$", # regex pattern, some explanation below
  recursive = TRUE,          # search subdirectories
  full.names = TRUE          # return the full path
)
data_lst = lapply(files_to_read, read.csv)  # read all the matching files

要了解有关正则表达式的更多信息，我建议 regex101.com。 .*、(1|2) 匹配 1 或 2，而 $ 匹配字符串的结尾，因此 ".*(1|2).*csv$" 将匹配所有包含 1 或 2 且以 csv 结尾的字符串。

Answer 2

如果您只是想从任何子目录中读取任何匹配的文件名，您可以试试这个：

regular_expression <- "name_[A-z]+_"
names_csv <- c('1', '2')
names_to_read <- paste0(regular_expression, names_csv, "\.csv", collapse = "|")
fileList <- list.files(pattern = names_to_read, path = common_path, 
                       recursive = TRUE, full.names = TRUE)    
data_lst <- lapply(files_to_read, function(x) read.csv(x))

输出应该是一个列表，其中每个条目都是您的一个 csv 文件。

我不清楚你是否想根据每个文件从中读取的目录来保持分隔，所以我没有包括它。

使用 R 读取多个目录中的特定 csv 文件

Read specific csv files in multiple directories with R

directory

r

read.csv