如何将 R for(){} 函数从 dplyr 重写为 data.table？

Question

我在读取文件夹中特定文件列的函数中有这个 for(){}。但是因为我有几个文件所以速度很慢。

如何将其重写为 data.table 格式？

我使用 arrange()，因为之后我将按名称绑定这两个 df。名称在文件中相同，但在这些文件中的顺序不同。为此，必须按名称绑定列 class1 和 class2 我使用 arrange()。

for (i in 1:length(temp)) {
    
    df1 <- read_table(temp[[i]],
                      col_types = "c________________f__",
                      col_names = c("name", "class1")) %>% 
      arrange(name)
    
    df2 <- read_table(str_remove(temp[[i]], "_automat"),
                      col_types = "c________________f__",
                      col_names = c("name", "class2")) %>% 
      arrange(name)
}

Answer 1

如果您只想将其转换为 data.tables，您可以从 read_table 切换到 fread，这应该更快并且会生成 data.table其中 you can sort with [order(*)]:

library(data.table)

fread(file=temp[[i]], select = c(name='character', class1='numeric'))[order(name)]

这可能会提高你的速度，但我认为如果你想要更显着的改进，我会考虑用 foreach 中的并行 foreach 循环替换你的 for 循环] 包裹。有很多关于如何做到这一点的问题，但您可能想从这里开始：

如何将 R for(){} 函数从 dplyr 重写为 data.table？

How rewrite R for(){} function from dplyr to data.table?

for-loop

r

dplyr

data.table