合并 >100 个 xlsx 文件，由于文件中的列不同，日期格式出现 POSIX 问题

Question

对这一切都比较陌生，希望得到一些帮助：

我正在寻求在 R 中合并 >100 个 xlsx 文件。它们的列大多相似，所以我稍后会在列表中使用 rbind.fill。但是，由于日期格式问题，我很难将列表合并到一个大数据框中。我遇到以下错误：

Error in as.POSIXlt.character(x, tz, ...) : 
  character string is not in a standard unambiguous format

到目前为止，这是我的代码：

# Reading the list of files in my folder
file_list <- list.files(path="data/")

# reading the data into a list
library(readxl)

All <- lapply(file_list,function(filename){
    print(paste("Merging",filename,sep = " "))
    read_xlsx(filename)
    })

# Merge to one dataframe
df <- do.call(rbind.fill, All)

这是我收到错误的地方。我认为这是因为在某些文件中，日期列具有不同的格式。

问题：有没有办法 lapply 函数（或类似函数）使列表中的所有日期列都采用相同的格式？我在这里错过了一些非常明显的东西吗？如果有一种方法可以将它们转换为字符，然后再返回日期，如果这是一个快速的解决方案，我会很好，但我不确定该怎么做。感谢您的帮助。

Answer 1

如果您知道所有可能的 Date 列名称，那么您可能会这样做，before rbind:

dtcols <- c("date", "Somedate", "date123")
All <- lapply(All, function(dat) {
  cols <- intersect(dtcols, names(dat))
  dat[cols] <- lapply(dat[cols], as.Date)
  dat
})

as.Date 是幂等的，因此如果列已经是 class Date，则可以安全使用。如果 cols 为空（未找到任何列），这仍然是安全的，什么也不做。

根据源数据，您可能需要提供 origin 或 format 到 as.Date，例如

  dat[cols] <- lapply(dat[cols], as.Date, format = "%m/%d/%Y")

合并 >100 个 xlsx 文件，由于文件中的列不同，日期格式出现 POSIX 问题

Combining >100 xlsx files, issue with POSIX date format due to differing columns in files

r

lapply