使用异构类型初始化 data.table
Initialize data.table with heterogeneous types
我需要构建一个大的 data.table
,其中每一行都是一个用户,列是不同类型的属性。我需要逐行填写 table 。我应该如何初始化它?
例如,如果我这样做:
dt.hetero <- data.table(matrix(-1, nrow=3, ncol=6))
names(dt.hetero) <- c("name", "lastname", "city", "age", "weight", "heigh")
dt.hetero[1, age:=34]
dt.hetero[1, name:="alice"]
它期望到处都是双打,因此当我尝试输入字符串时收到警告:
Warning messages:
1: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
NAs introduced by coercion
2: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
Coerced 'character' RHS to 'double' to match the column's type. Either change the target column to 'character' first (by creating a new 'character' vector length 3 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'double' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
dt.hetero[1, name:="alice"]
编辑:
我按顺序获取用户数据。因此,过程是
for every user:
- get user data
- copy user data to row in data.table
return data.table
创建空的时候可以直接指定每列的类型 data.table :
dt.hetero <- data.table(name = character(3L),
lastname = character(3L),
city = character(3L),
age = integer(3L),
weight = double(3L),
height = double(3L))
您可以根据实际需要的行数更改数字“3”。
I need to fill the table row by row.
如果您手动执行此操作,请考虑...
res <- fread("
name age weight
Bob 101 111
Alice 33 77 ")
或...
rows <- list(
list(name = "Bob" , age = 101, weight = 111 ),
list(name = "Alice" , age = 33 , weight = 77 )
)
res2 <- rbindlist(rows)
如果您按顺序获取数据,也可以使用第二种方法:
rows <- vector("list",3)
rows[[1]] <- list(name = "Bob" , age = 101, weight = 111 )
rows[[2]] <- list(name = "Alice" , age = 33 , weight = 77 )
rows[[3]] <- list(name = "Cadmus" , age = 44 , weight = 55 )
res2 <- rbindlist(rows)
显然,这也适用于循环:
for (i in seq_along(rows)){
# ... do_stuff to find row info ...
rows[[i]] <- # put row info here
}
res2 <- rbindlist(rows)
在 R 中这是一种非常缓慢的工作方式 - 请参阅 "Second Circle" of R Inferno。你'vectorise'这个过程更有效率:
users = c('John','Jill','James')
ages = c(25,53,37)
# of course there is: data.frame(user = users, age=ages), but assuming that's
# not possible in this case..
users_list <- lapply(1:3, FUN=function(i){
return(data.frame(user = users[i],
age = ages[i]))
})
do.call('rbind', users_list)
user age
1 John 25
2 Jill 53
3 James 37
我需要构建一个大的 data.table
,其中每一行都是一个用户,列是不同类型的属性。我需要逐行填写 table 。我应该如何初始化它?
例如,如果我这样做:
dt.hetero <- data.table(matrix(-1, nrow=3, ncol=6))
names(dt.hetero) <- c("name", "lastname", "city", "age", "weight", "heigh")
dt.hetero[1, age:=34]
dt.hetero[1, name:="alice"]
它期望到处都是双打,因此当我尝试输入字符串时收到警告:
Warning messages:
1: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
NAs introduced by coercion
2: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
Coerced 'character' RHS to 'double' to match the column's type. Either change the target column to 'character' first (by creating a new 'character' vector length 3 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'double' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
dt.hetero[1, name:="alice"]
编辑:
我按顺序获取用户数据。因此,过程是
for every user:
- get user data
- copy user data to row in data.table
return data.table
创建空的时候可以直接指定每列的类型 data.table :
dt.hetero <- data.table(name = character(3L),
lastname = character(3L),
city = character(3L),
age = integer(3L),
weight = double(3L),
height = double(3L))
您可以根据实际需要的行数更改数字“3”。
I need to fill the table row by row.
如果您手动执行此操作,请考虑...
res <- fread("
name age weight
Bob 101 111
Alice 33 77 ")
或...
rows <- list(
list(name = "Bob" , age = 101, weight = 111 ),
list(name = "Alice" , age = 33 , weight = 77 )
)
res2 <- rbindlist(rows)
如果您按顺序获取数据,也可以使用第二种方法:
rows <- vector("list",3)
rows[[1]] <- list(name = "Bob" , age = 101, weight = 111 )
rows[[2]] <- list(name = "Alice" , age = 33 , weight = 77 )
rows[[3]] <- list(name = "Cadmus" , age = 44 , weight = 55 )
res2 <- rbindlist(rows)
显然,这也适用于循环:
for (i in seq_along(rows)){
# ... do_stuff to find row info ...
rows[[i]] <- # put row info here
}
res2 <- rbindlist(rows)
在 R 中这是一种非常缓慢的工作方式 - 请参阅 "Second Circle" of R Inferno。你'vectorise'这个过程更有效率:
users = c('John','Jill','James')
ages = c(25,53,37)
# of course there is: data.frame(user = users, age=ages), but assuming that's
# not possible in this case..
users_list <- lapply(1:3, FUN=function(i){
return(data.frame(user = users[i],
age = ages[i]))
})
do.call('rbind', users_list)
user age
1 John 25
2 Jill 53
3 James 37