加载包含太多 levels/categories h2o.importFile() 的数据时出错

Question

我正在尝试在 R

中使用 h2o.importfile 导入大型 .csv 文件

library(h2o)
h2o.init()
dataFile <- "big_file.csv" 
h2o.importFile(dataFile,header=TRUE,destination_frame = "data.hex")

该文件有多个 id 列。我收到以下错误消息。

错误：water.parser.ParseDataset$H2OParseException：超出列 [id1、id2] 的分类限制。考虑将这些列重新解析为字符串。

有没有办法将这些列类型指定为类似于 data.frame(stringAsFactors = FALSE)

的字符串

Answer 1

在 h2o.importFile 函数中指定 col.types 参数应该适合您。

write.csv(iris, "iris.csv")
hf0 <- h2o.importFile("iris.csv", col.types = c("int","real","real","real","real","string"))
unlist(h2o.getTypes(hf0))
[1] "int"    "real"   "real"   "real"   "real"   "string"

加载包含太多 levels/categories h2o.importFile() 的数据时出错

Error loading data with too many levels/categories h2o.importFile()

r

h2o