是否已保存并加载 data.table 且 qs 是正确的 data.table
Is a saved and loaded data.table with qs a correct data.table
我正在将 data.table 保存为 qs 文件,当我再次加载 data.table 时,它似乎不会像 data.table 那样立即运行。
澄清我的意思。这是一个例子:
library(data.table)
library(qs)
n <- 10000
dt <- data.table(x = rnorm(n), y = rnorm(n))
cnames <- colnames(dt)
dt[, new_col_1 := 1]
cnames
[1] "x" "y" "new_col_1"
cnames <- colnames(dt)
dt[, new_col_2 := 1]
cnames
[1] "x" "y" "new_col_1" "new_col_2"
所以函数 colnames()
是指向 data.table dt
的列名的指针。
但是如果我对保存和加载的 data.table 做同样的事情,就会发生这种情况
n <- 10000
dt <- data.table(x = rnorm(n), y = rnorm(n))
qs::qsave(dt, "dt_saved.qs")
dt_saved <- qs::qread("dt_saved.qs")
cnames <- colnames(dt_saved)
dt_saved[, new_col_1 := 1]
cnames
[1] "x" "y"
cnames <- colnames(dt_saved)
dt_saved[, new_col_2 := 1]
cnames
[1] "x" "y" "new_col_1" "new_col_2"
所以只有修改了data.table后函数colnames()
才是指针
一些额外信息:
R version 4.1.1 (2021-08-10)
qs_0.25.1
data.table_1.14.2
这是因为你的data.table在存储文件中没有(或没有有效的)内部指针,所以它不能通过引用更新(因为没有引用)。
你也可以用输入来演示它。
# create a data.table, but it has no internal.selfref
dt <- structure(list(x = c("A", "B", "C"), y = 1:3), row.names = c(NA,
-3L), class = c("data.table", "data.frame"))
cnames <- colnames(dt)
dt[, new_col_1 := 1]
# Warning message:
# In `[.data.table`(dt, , `:=`(new_col_1, 1)) :
# Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
cnames
# [1] "x" "y"
所以您可以看到引用是从您分配新列的那一刻创建的,但它无法更新 cname,因为那里没有引用。
您可以在加载数据后立即使用 setDT,这样您也可以在 cnames 上获得参考
dt <- structure(list(x = c("A", "B", "C"), y = 1:3), row.names = c(NA,
-3L), class = c("data.table", "data.frame"))
setDT(dt) # create reference so cnames updates on reference as well
cnames <- colnames(dt)
dt[, new_col_1 := 1]
cnames
# [1] "x" "y" "new_col_1"
我正在将 data.table 保存为 qs 文件,当我再次加载 data.table 时,它似乎不会像 data.table 那样立即运行。
澄清我的意思。这是一个例子:
library(data.table)
library(qs)
n <- 10000
dt <- data.table(x = rnorm(n), y = rnorm(n))
cnames <- colnames(dt)
dt[, new_col_1 := 1]
cnames
[1] "x" "y" "new_col_1"
cnames <- colnames(dt)
dt[, new_col_2 := 1]
cnames
[1] "x" "y" "new_col_1" "new_col_2"
所以函数 colnames()
是指向 data.table dt
的列名的指针。
但是如果我对保存和加载的 data.table 做同样的事情,就会发生这种情况
n <- 10000
dt <- data.table(x = rnorm(n), y = rnorm(n))
qs::qsave(dt, "dt_saved.qs")
dt_saved <- qs::qread("dt_saved.qs")
cnames <- colnames(dt_saved)
dt_saved[, new_col_1 := 1]
cnames
[1] "x" "y"
cnames <- colnames(dt_saved)
dt_saved[, new_col_2 := 1]
cnames
[1] "x" "y" "new_col_1" "new_col_2"
所以只有修改了data.table后函数colnames()
才是指针
一些额外信息:
R version 4.1.1 (2021-08-10)
qs_0.25.1
data.table_1.14.2
这是因为你的data.table在存储文件中没有(或没有有效的)内部指针,所以它不能通过引用更新(因为没有引用)。
你也可以用输入来演示它。
# create a data.table, but it has no internal.selfref
dt <- structure(list(x = c("A", "B", "C"), y = 1:3), row.names = c(NA,
-3L), class = c("data.table", "data.frame"))
cnames <- colnames(dt)
dt[, new_col_1 := 1]
# Warning message:
# In `[.data.table`(dt, , `:=`(new_col_1, 1)) :
# Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
cnames
# [1] "x" "y"
所以您可以看到引用是从您分配新列的那一刻创建的,但它无法更新 cname,因为那里没有引用。
您可以在加载数据后立即使用 setDT,这样您也可以在 cnames 上获得参考
dt <- structure(list(x = c("A", "B", "C"), y = 1:3), row.names = c(NA,
-3L), class = c("data.table", "data.frame"))
setDT(dt) # create reference so cnames updates on reference as well
cnames <- colnames(dt)
dt[, new_col_1 := 1]
cnames
# [1] "x" "y" "new_col_1"