Getting Error: Could not find function "." for only one data.table using .I
Getting Error: Could not find function "." for only one data.table using .I
我已经回答过几个标题相似的问题,我相信我的情况有所不同。可以肯定的是,我已经停止了我的 Rstudio 服务器,卸载了 data.table,然后在重新启动 Rstudio 服务器之前从源代码重新安装了它。
我有一个 data.table 看起来像:
wind<- structure(list(pricedate = structure(c(1538629200, 1538629200,
1538629200, 1538629200, 1538629200), class = c("POSIXct", "POSIXt"
), tzone = "America/Chicago"), hour = c(1L, 1L, 1L, 1L, 1L),
type = c("cop_hsl", "stwpf", "wgrpp", "cop_hsl", "stwpf"),
zone = c("coastal", "coastal", "coastal", "north", "north"
), as_of = structure(c(1538199804, 1538199804, 1538199804,
1538199804, 1538199804), class = c("POSIXct", "POSIXt"), tzone = "America/Chicago"),
wind = c(712, 751.5, 548.2, 843, 846), age = c("4day", "4day",
"4day", "4day", "4day"), daysold = c(4L, 4L, 4L, 4L, 4L)), row.names = c(NA,
-5L), class = c("data.table", "data.frame"))
完整的 table 大约有 2000 万行,占用 1.1GB 的内存,据 tables()
报道
以下命令有效:
windindx<-wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]
wind[windindx]
将它们组合成:
wind[wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]]
结果 Error: could not find function "."
如果我对 data.table 进行子集化,那么它就可以工作,就像这样:
windsm<-wind[round(runif(10000000,0,20676204))]
windsm[windsm[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]]
这是我的 sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] yaml_2.2.1 R.utils_2.10.1 R.oo_1.24.0 R.methodsS3_1.8.1 nanotime_0.3.3 xts_0.12.1 zoo_1.8-9 bit64_4.0.5 bit_4.0.4
[10] glue_1.4.2 magrittr_2.0.1 future_1.21.0 lubridate_1.7.10 data.table_1.14.0 ggplot2_3.3.5 DALEX_2.3.0 mlr3tuning_0.8.0 paradox_0.7.1
[19] mlr3viz_0.5.5 mlr3learners_0.4.5 mlr3_0.12.0 RPostgres_1.3.3
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 xfun_0.25 purrr_0.3.4 listenv_0.8.0 lattice_0.20-44 colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0
[9] htmltools_0.5.1.1 bbotk_0.3.2 utf8_1.2.2 blob_1.2.2 rlang_0.4.11 pillar_1.6.2 withr_2.4.2 DBI_1.1.1
[17] palmerpenguins_0.1.0 uuid_0.1-4 lifecycle_1.0.0 munsell_0.5.0 gtable_0.3.0 codetools_0.2-18 evaluate_0.14 knitr_1.33
[25] parallel_4.1.1 fansi_0.5.0 Rcpp_1.0.7 scales_1.1.1 backports_1.2.1 checkmate_2.0.0 RcppCCTZ_0.2.9 parallelly_1.27.0
[33] hms_1.1.0 digest_0.6.27 dplyr_1.0.7 grid_4.1.1 tools_4.1.1 tibble_3.1.3 mlr3misc_0.9.3 crayon_1.4.1
[41] pkgconfig_2.0.3 ellipsis_0.3.2 rmarkdown_2.10 lgr_0.4.2 R6_2.5.0 globals_0.14.0 compiler_4.1.1
关于(相对)大的 data.table 是否有什么东西阻止它工作?我使用的机器是云上的 64GB VM。 htop 仅报告使用了大约 3.5GB 的内存,因此仍有大约 60GB 的可用内存。我的工作并不太繁重,所以我对答案比任何事情都更好奇。
编辑:为了赏金,我想知道为什么有时只需要 eval
。
您可以使用 eval
强制计算基础环境中的 i
参数:
wind[eval(wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1])]
pricedate hour type zone as_of wind age daysold
1: 2018-10-04 1 cop_hsl coastal 2018-09-29 00:43:24 712.0 4day 4
2: 2018-10-04 1 stwpf coastal 2018-09-29 00:43:24 751.5 4day 4
3: 2018-10-04 1 wgrpp coastal 2018-09-29 00:43:24 548.2 4day 4
4: 2018-10-04 1 cop_hsl north 2018-09-29 00:43:24 843.0 4day 4
5: 2018-10-04 1 stwpf north 2018-09-29 00:43:24 846.0 4day 4
help('data.table
)` 表示:
Advanced: When i is a single variable name, it is not considered an expression of column names and is instead evaluated in calling scope.
这就是为什么 i
是单个变量 windindx
的第一个解决方案有效,但在错误范围内评估的组合无效的原因。
详细说明@Waldi 的回答:
> wind[browser()]
Called from: eval(.massagei(isub), x, ienv)
Browse[1]> wind
[1] 712.0 751.5 548.2 843.0 846.0
Browse[1]> wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]
Error in .(pricedate, hour) : could not find function "."
Browse[2]> ls()
[1] "age" "as_of" "daysold" "hour" "pricedate" "type" "wind" "zone"
对于语法x[i]
,这是我们在i
不是单个符号时操作的环境。此环境包括 x
列,查找符号时首先在其中查找。
如果我们改为传递单个符号,如 wind[wind, on=.(hour)]
,那么它会在父环境中进行查找,而不需要在 [=14= 的列中计算 i
] 环境。
我认为@Waldi 的回答中引用的文档足以告诉我们如何避免这个问题,但就其价值而言,这似乎是代码的 the relevant part:
else if (!is.name(isub)) {
ienv = new.env(parent=parent.frame())
if (getOption("datatable.optimize")>=1L) assign("order", forder, ienv)
i = tryCatch(eval(.massagei(isub), x, ienv), error=function(e) {
if (grepl(":=.*defined for use in j.*only", e$message))
stopf("Operator := detected in i, the first argument inside DT[...], but is only valid in the second argument, j. Most often, this happens when forgetting the first comma (e.g. DT[newvar := 5] instead of DT[ , new_var := 5]). Please double-check the syntax. Run traceback(), and debugger() to get a line number.")
else
.checkTypos(e, names_x)
})
} else {
# isub is a single symbol name such as B in DT[B]
i = try(eval(isub, parent.frame(), parent.frame()), silent=TRUE)
...
}
我已经回答过几个标题相似的问题,我相信我的情况有所不同。可以肯定的是,我已经停止了我的 Rstudio 服务器,卸载了 data.table,然后在重新启动 Rstudio 服务器之前从源代码重新安装了它。
我有一个 data.table 看起来像:
wind<- structure(list(pricedate = structure(c(1538629200, 1538629200,
1538629200, 1538629200, 1538629200), class = c("POSIXct", "POSIXt"
), tzone = "America/Chicago"), hour = c(1L, 1L, 1L, 1L, 1L),
type = c("cop_hsl", "stwpf", "wgrpp", "cop_hsl", "stwpf"),
zone = c("coastal", "coastal", "coastal", "north", "north"
), as_of = structure(c(1538199804, 1538199804, 1538199804,
1538199804, 1538199804), class = c("POSIXct", "POSIXt"), tzone = "America/Chicago"),
wind = c(712, 751.5, 548.2, 843, 846), age = c("4day", "4day",
"4day", "4day", "4day"), daysold = c(4L, 4L, 4L, 4L, 4L)), row.names = c(NA,
-5L), class = c("data.table", "data.frame"))
完整的 table 大约有 2000 万行,占用 1.1GB 的内存,据 tables()
以下命令有效:
windindx<-wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]
wind[windindx]
将它们组合成:
wind[wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]]
结果 Error: could not find function "."
如果我对 data.table 进行子集化,那么它就可以工作,就像这样:
windsm<-wind[round(runif(10000000,0,20676204))]
windsm[windsm[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]]
这是我的 sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] yaml_2.2.1 R.utils_2.10.1 R.oo_1.24.0 R.methodsS3_1.8.1 nanotime_0.3.3 xts_0.12.1 zoo_1.8-9 bit64_4.0.5 bit_4.0.4
[10] glue_1.4.2 magrittr_2.0.1 future_1.21.0 lubridate_1.7.10 data.table_1.14.0 ggplot2_3.3.5 DALEX_2.3.0 mlr3tuning_0.8.0 paradox_0.7.1
[19] mlr3viz_0.5.5 mlr3learners_0.4.5 mlr3_0.12.0 RPostgres_1.3.3
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 xfun_0.25 purrr_0.3.4 listenv_0.8.0 lattice_0.20-44 colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0
[9] htmltools_0.5.1.1 bbotk_0.3.2 utf8_1.2.2 blob_1.2.2 rlang_0.4.11 pillar_1.6.2 withr_2.4.2 DBI_1.1.1
[17] palmerpenguins_0.1.0 uuid_0.1-4 lifecycle_1.0.0 munsell_0.5.0 gtable_0.3.0 codetools_0.2-18 evaluate_0.14 knitr_1.33
[25] parallel_4.1.1 fansi_0.5.0 Rcpp_1.0.7 scales_1.1.1 backports_1.2.1 checkmate_2.0.0 RcppCCTZ_0.2.9 parallelly_1.27.0
[33] hms_1.1.0 digest_0.6.27 dplyr_1.0.7 grid_4.1.1 tools_4.1.1 tibble_3.1.3 mlr3misc_0.9.3 crayon_1.4.1
[41] pkgconfig_2.0.3 ellipsis_0.3.2 rmarkdown_2.10 lgr_0.4.2 R6_2.5.0 globals_0.14.0 compiler_4.1.1
关于(相对)大的 data.table 是否有什么东西阻止它工作?我使用的机器是云上的 64GB VM。 htop 仅报告使用了大约 3.5GB 的内存,因此仍有大约 60GB 的可用内存。我的工作并不太繁重,所以我对答案比任何事情都更好奇。
编辑:为了赏金,我想知道为什么有时只需要 eval
。
您可以使用 eval
强制计算基础环境中的 i
参数:
wind[eval(wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1])]
pricedate hour type zone as_of wind age daysold
1: 2018-10-04 1 cop_hsl coastal 2018-09-29 00:43:24 712.0 4day 4
2: 2018-10-04 1 stwpf coastal 2018-09-29 00:43:24 751.5 4day 4
3: 2018-10-04 1 wgrpp coastal 2018-09-29 00:43:24 548.2 4day 4
4: 2018-10-04 1 cop_hsl north 2018-09-29 00:43:24 843.0 4day 4
5: 2018-10-04 1 stwpf north 2018-09-29 00:43:24 846.0 4day 4
help('data.table
)` 表示:
Advanced: When i is a single variable name, it is not considered an expression of column names and is instead evaluated in calling scope.
这就是为什么 i
是单个变量 windindx
的第一个解决方案有效,但在错误范围内评估的组合无效的原因。
详细说明@Waldi 的回答:
> wind[browser()]
Called from: eval(.massagei(isub), x, ienv)
Browse[1]> wind
[1] 712.0 751.5 548.2 843.0 846.0
Browse[1]> wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]
Error in .(pricedate, hour) : could not find function "."
Browse[2]> ls()
[1] "age" "as_of" "daysold" "hour" "pricedate" "type" "wind" "zone"
对于语法x[i]
,这是我们在i
不是单个符号时操作的环境。此环境包括 x
列,查找符号时首先在其中查找。
如果我们改为传递单个符号,如 wind[wind, on=.(hour)]
,那么它会在父环境中进行查找,而不需要在 [=14= 的列中计算 i
] 环境。
我认为@Waldi 的回答中引用的文档足以告诉我们如何避免这个问题,但就其价值而言,这似乎是代码的 the relevant part:
else if (!is.name(isub)) {
ienv = new.env(parent=parent.frame())
if (getOption("datatable.optimize")>=1L) assign("order", forder, ienv)
i = tryCatch(eval(.massagei(isub), x, ienv), error=function(e) {
if (grepl(":=.*defined for use in j.*only", e$message))
stopf("Operator := detected in i, the first argument inside DT[...], but is only valid in the second argument, j. Most often, this happens when forgetting the first comma (e.g. DT[newvar := 5] instead of DT[ , new_var := 5]). Please double-check the syntax. Run traceback(), and debugger() to get a line number.")
else
.checkTypos(e, names_x)
})
} else {
# isub is a single symbol name such as B in DT[B]
i = try(eval(isub, parent.frame(), parent.frame()), silent=TRUE)
...
}