有什么方法可以按 R 中 read_excel(readxl) 中的列名分配 col_types

Question

我的应用程序正在使用 readxl 包的 read_excel 函数读取 xls 和 xlsx 文件。

在读取 xls 或 xlsx 文件时，之前不知道列的顺序和确切数量。有 15 个预定义列，其中 10 个列 强制，其余 5 个列 是 可选的 。因此文件将始终有 最少 10 列和 最多 15 列。

我需要将 col-types 指定为必填的 10 列。我能想到的唯一方法是使用列名来指定 col_types 因为我知道该文件具有所有 10 列，这些列是强制性的，但它们是随机序列。

我试图寻找这样做的方法，但没有这样做。

谁能帮我找到一种按列名分配 col_types 的方法？

Answer 1

我通过以下解决方法解决了问题。但这不是解决此问题的最佳方法。我已经 读取 excel 文件两次 如果文件的数据量非常大，这将对性能造成影响。

首先阅读： 构建列数据类型向量- 读取文件以检索列信息（如列名、列数和它的类型）并构建 column_data_types vector 文件中的每一列都有 datatype。

#reading .xlsx file
site_data_columns <- read_excel(paste(File$datapath, ".xlsx", sep = ""))

site_data_column_names <- colnames(site_data_columns)

for(i in 1 : length(site_data_column_names)){  

    #where date is a column name
    if(site_data_column_names[i] == "date"){
         column_data_types[i] <- "date"

         #where result is a column name
         } else if (site_data_column_names[i] == "result") {
                      column_data_types[i] <- "numeric"

         } else{
                column_data_types[i] <- "text"
        }
}

第二次读取： 读取文件内容- 通过提供 col_types 参数读取 excel 文件vector column_data_types 包含 data types.

列

#reading .xlsx file
site_data <- read_excel(paste(File$datapath, ".xlsx", sep = ""), col_types = column_data_types)

有什么方法可以按 R 中 read_excel(readxl) 中的列名分配 col_types

Is there any way to assign the col_types by column names in read_excel(readxl) in R

excel

r

readxl