在 R data.table 中转换多列 类 的问题
Issue converting multiple column classes in R data.table
我想转换R中的变量列表data.table,但是,这种转换导致了意想不到的后果。我 运行 在 R 版本 4.0.1 下,库 data.table_1.12.8。这是一个简化的例子:
> dput(norw5)
structure(list(Born_before_2016 = c(1L, 1L, 1L, 1L, 1L), gender = c("2.Female",
"1.Male", "2.Female", "1.Male", "1.Male"), payor = c("1:Private",
"1:Private", "4:Other", "4:Other", "1:Private"), Age_in_day = c(0L,
0L, 0L, 4L, 5L)), row.names = c(NA, -5L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000024ce5a41ef0>)
library(data.table)
fact <- c('Born_before_2016', 'gender', 'payor')
varls <- scan(text=fact, what = "", quiet = T)
factcols <- sapply(norw5[,..varls], is.numeric)
norw5new <- norw5[, names(norw5)[factcols] := lapply(.SD, as.character),
.SDcols = factcols]
> dput(norw5new)
structure(list(Born_before_2016 = c("1", "1", "1", "1", "1"),
gender = c("2.Female", "1.Male", "2.Female", "1.Male", "1.Male"
), payor = c("1:Private", "1:Private", "4:Other", "4:Other",
"1:Private"), Age_in_day = c("0", "0", "0", "4", "5")), row.names = c(NA,
-5L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000024ce5a41ef0>)
如上所示,目标是找到数字变量(本例中为 Born_before_2016)并将它们转换为字符。但是,转换扩展到附加变量 Age_in_day,它根本不在列表中。我想不通,这里的 R 专家能否为我指明解决此问题的正确方向?
谢谢!
factcols
in .SDcols=factcols
应该是一个长度为4的逻辑向量或者第name/position列的向量,例如.SDcols = c("Born_before_2016"),.SDcols = 1
,但 factcols <- sapply(norw5[,..varls], is.numeric)
returns 长度为 3 的逻辑向量。
可以固定为
fact <- c('Born_before_2016','gender','payor')
factcols <- sapply(norw5[,..fact], is.numeric)
cols <- names(norw5)[1:3][factcols]
norw5new <- norw5[,(cols) := lapply(.SD,as.character),.SDcols=cols]
norw5new
# Born_before_2016 gender payor Age_in_day
# <char> <char> <char> <int>
#1: 1 2.Female 1:Private 0
#2: 1 1.Male 1:Private 0
#3: 1 2.Female 4:Other 0
#4: 1 1.Male 4:Other 4
#5: 1 1.Male 1:Private 5
我想转换R中的变量列表data.table,但是,这种转换导致了意想不到的后果。我 运行 在 R 版本 4.0.1 下,库 data.table_1.12.8。这是一个简化的例子:
> dput(norw5)
structure(list(Born_before_2016 = c(1L, 1L, 1L, 1L, 1L), gender = c("2.Female",
"1.Male", "2.Female", "1.Male", "1.Male"), payor = c("1:Private",
"1:Private", "4:Other", "4:Other", "1:Private"), Age_in_day = c(0L,
0L, 0L, 4L, 5L)), row.names = c(NA, -5L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000024ce5a41ef0>)
library(data.table)
fact <- c('Born_before_2016', 'gender', 'payor')
varls <- scan(text=fact, what = "", quiet = T)
factcols <- sapply(norw5[,..varls], is.numeric)
norw5new <- norw5[, names(norw5)[factcols] := lapply(.SD, as.character),
.SDcols = factcols]
> dput(norw5new)
structure(list(Born_before_2016 = c("1", "1", "1", "1", "1"),
gender = c("2.Female", "1.Male", "2.Female", "1.Male", "1.Male"
), payor = c("1:Private", "1:Private", "4:Other", "4:Other",
"1:Private"), Age_in_day = c("0", "0", "0", "4", "5")), row.names = c(NA,
-5L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000024ce5a41ef0>)
如上所示,目标是找到数字变量(本例中为 Born_before_2016)并将它们转换为字符。但是,转换扩展到附加变量 Age_in_day,它根本不在列表中。我想不通,这里的 R 专家能否为我指明解决此问题的正确方向? 谢谢!
factcols
in .SDcols=factcols
应该是一个长度为4的逻辑向量或者第name/position列的向量,例如.SDcols = c("Born_before_2016"),.SDcols = 1
,但 factcols <- sapply(norw5[,..varls], is.numeric)
returns 长度为 3 的逻辑向量。
可以固定为
fact <- c('Born_before_2016','gender','payor')
factcols <- sapply(norw5[,..fact], is.numeric)
cols <- names(norw5)[1:3][factcols]
norw5new <- norw5[,(cols) := lapply(.SD,as.character),.SDcols=cols]
norw5new
# Born_before_2016 gender payor Age_in_day
# <char> <char> <char> <int>
#1: 1 2.Female 1:Private 0
#2: 1 1.Male 1:Private 0
#3: 1 2.Female 4:Other 0
#4: 1 1.Male 4:Other 4
#5: 1 1.Male 1:Private 5