如何通过引用转换按位置索引的数据 table 列?
How to transform data table columns, indexed by position, by reference?
我有一个 data.table
,其中包含几列 factor
。我想将最初读取为 factor
s 的 2 列转换为其原始数值。这是我尝试过的:
data[, c(4,5):=c(as.numeric(as.character(4)), as.numeric(as.character(5))), with=FALSE]
这给了我以下警告:
Warning messages:
1: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Supplied 2 items to be assigned to 7 items of column 'Bentley (R)' (recycled leaving remainder of 1 items).
2: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Supplied 2 items to be assigned to 7 items of column 'Sparks (D)' (recycled leaving remainder of 1 items).
3: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
4: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
我还可以看出转换没有成功,因为在这段代码 运行 之后,第 4 列和第 5 列仍然是 factor
s。
作为替代方案,我尝试了这段代码,它根本不会 运行:
data[, ':=' (4=c(as.numeric(as.character(4)), 5 = as.numeric(as.character(5)))), with=FALSE]
最后,我尝试通过 colnames
:
引用列名
data[ , (colnames(data)[4]) := as.numeric(as.character(colnames(data)[4]))]
这 运行s 但导致一行 NA
s 以及以下错误:
Warning messages:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
3: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
RHS contains -2147483648 which is outside the levels range ([1,6]) of column 1, NAs generated
我需要按位置而不是按列名执行此操作,因为列名将取决于 URL。使用 data.table
按位置转换列的正确方法是什么?
我还有一个相关的查询,就是如何相对于其他编号列转换编号列。例如,如果我想将第 3 列设置为等于 45 减去第 3 列的值加上第 4 列的值,我该怎么做?有什么方法可以区分真实的 # 和列号吗?我知道这样的事情不是要走的路:
dt[ , .(4) = 45 - .(3) + .(4), with = FALSE]
那怎么办呢?
如果你想通过引用和位置分配,你需要获取列名作为字符向量或列号作为整数向量并使用.SDcols
(至少在data.table 1.9.4).
首先是一个可重现的例子:
library(data.table)
DT <- data.table(iris)
DT[, c("Sepal.Length", "Petal.Length") := list(factor(Sepal.Length), factor(Petal.Length))]
str(DT)
现在让我们转换列:
DT[, names(DT)[c(1, 3)] := lapply(.SD, function(x) as.numeric(as.character(x))),
.SDcols = c(1, 3)]
str(DT)
或者:
DT[, c(1,3) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcols=c(1,3)]
str(DT)
请注意,:=
需要左侧为列名或位置的向量,右侧为列表。
我有一个 data.table
,其中包含几列 factor
。我想将最初读取为 factor
s 的 2 列转换为其原始数值。这是我尝试过的:
data[, c(4,5):=c(as.numeric(as.character(4)), as.numeric(as.character(5))), with=FALSE]
这给了我以下警告:
Warning messages:
1: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Supplied 2 items to be assigned to 7 items of column 'Bentley (R)' (recycled leaving remainder of 1 items).
2: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Supplied 2 items to be assigned to 7 items of column 'Sparks (D)' (recycled leaving remainder of 1 items).
3: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
4: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
我还可以看出转换没有成功,因为在这段代码 运行 之后,第 4 列和第 5 列仍然是 factor
s。
作为替代方案,我尝试了这段代码,它根本不会 运行:
data[, ':=' (4=c(as.numeric(as.character(4)), 5 = as.numeric(as.character(5)))), with=FALSE]
最后,我尝试通过 colnames
:
data[ , (colnames(data)[4]) := as.numeric(as.character(colnames(data)[4]))]
这 运行s 但导致一行 NA
s 以及以下错误:
Warning messages:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
3: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
RHS contains -2147483648 which is outside the levels range ([1,6]) of column 1, NAs generated
我需要按位置而不是按列名执行此操作,因为列名将取决于 URL。使用 data.table
按位置转换列的正确方法是什么?
我还有一个相关的查询,就是如何相对于其他编号列转换编号列。例如,如果我想将第 3 列设置为等于 45 减去第 3 列的值加上第 4 列的值,我该怎么做?有什么方法可以区分真实的 # 和列号吗?我知道这样的事情不是要走的路:
dt[ , .(4) = 45 - .(3) + .(4), with = FALSE]
那怎么办呢?
如果你想通过引用和位置分配,你需要获取列名作为字符向量或列号作为整数向量并使用.SDcols
(至少在data.table 1.9.4).
首先是一个可重现的例子:
library(data.table)
DT <- data.table(iris)
DT[, c("Sepal.Length", "Petal.Length") := list(factor(Sepal.Length), factor(Petal.Length))]
str(DT)
现在让我们转换列:
DT[, names(DT)[c(1, 3)] := lapply(.SD, function(x) as.numeric(as.character(x))),
.SDcols = c(1, 3)]
str(DT)
或者:
DT[, c(1,3) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcols=c(1,3)]
str(DT)
请注意,:=
需要左侧为列名或位置的向量,右侧为列表。