使用异构类型初始化 data.table

Initialize data.table with heterogeneous types

我需要构建一个大的 data.table,其中每一行都是一个用户,列是不同类型的属性。我需要逐行填写 table 。我应该如何初始化它?

例如,如果我这样做:

dt.hetero <- data.table(matrix(-1, nrow=3, ncol=6))
names(dt.hetero) <- c("name", "lastname", "city", "age", "weight", "heigh")
dt.hetero[1, age:=34]
dt.hetero[1, name:="alice"]

它期望到处都是双打,因此当我尝试输入字符串时收到警告:

Warning messages:
1: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
  NAs introduced by coercion
2: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
  Coerced 'character' RHS to 'double' to match the column's type. Either change the target column to 'character' first (by creating a new 'character' vector length 3 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'double' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
dt.hetero[1, name:="alice"]

编辑:

我按顺序获取用户数据。因此,过程是

for every user:

  • get user data
  • copy user data to row in data.table

return data.table

创建空的时候可以直接指定每列的类型 data.table :

dt.hetero <- data.table(name = character(3L), 
                        lastname = character(3L), 
                        city = character(3L), 
                        age = integer(3L), 
                        weight = double(3L), 
                        height = double(3L))

您可以根据实际需要的行数更改数字“3”。

I need to fill the table row by row.

如果您手动执行此操作,请考虑...

res <- fread("
  name              age        weight
  Bob               101        111
  Alice             33         77     ")

或...

rows <- list(
  list(name = "Bob"    , age = 101, weight = 111 ),
  list(name = "Alice"  , age = 33 , weight = 77  ) 
)

res2 <- rbindlist(rows)

如果您按顺序获取数据,也可以使用第二种方法:

rows <- vector("list",3)

rows[[1]] <- list(name = "Bob"    , age = 101, weight = 111 )
rows[[2]] <- list(name = "Alice"  , age = 33 , weight = 77  ) 
rows[[3]] <- list(name = "Cadmus" , age = 44 , weight = 55  ) 

res2 <- rbindlist(rows)

显然,这也适用于循环:

for (i in seq_along(rows)){
  # ... do_stuff to find row info ...
  rows[[i]] <- # put row info here
}
res2 <- rbindlist(rows)

在 R 中这是一种非常缓慢的工作方式 - 请参阅 "Second Circle" of R Inferno。你'vectorise'这个过程更有效率:

users = c('John','Jill','James')
ages = c(25,53,37)

# of course there is: data.frame(user = users, age=ages), but assuming that's
# not possible in this case..

users_list <- lapply(1:3, FUN=function(i){
  return(data.frame(user = users[i],
                    age = ages[i]))
})

do.call('rbind', users_list)
user age
1  John  25
2  Jill  53
3 James  37