Getting Error: Could not find function "." for only one data.table using .I

Getting Error: Could not find function "." for only one data.table using .I

我已经回答过几个标题相似的问题,我相信我的情况有所不同。可以肯定的是,我已经停止了我的 Rstudio 服务器,卸载了 data.table,然后在重新启动 Rstudio 服务器之前从源代码重新安装了它。

我有一个 data.table 看起来像:

wind<-    structure(list(pricedate = structure(c(1538629200, 1538629200, 
                                       1538629200, 1538629200, 1538629200), class = c("POSIXct", "POSIXt"
                                       ), tzone = "America/Chicago"), hour = c(1L, 1L, 1L, 1L, 1L), 
               type = c("cop_hsl", "stwpf", "wgrpp", "cop_hsl", "stwpf"), 
               zone = c("coastal", "coastal", "coastal", "north", "north"
               ), as_of = structure(c(1538199804, 1538199804, 1538199804, 
                                      1538199804, 1538199804), class = c("POSIXct", "POSIXt"), tzone = "America/Chicago"), 
               wind = c(712, 751.5, 548.2, 843, 846), age = c("4day", "4day", 
                                                              "4day", "4day", "4day"), daysold = c(4L, 4L, 4L, 4L, 4L)), row.names = c(NA, 
                                                                                                                                       -5L), class = c("data.table", "data.frame"))

完整的 table 大约有 2000 万行,占用 1.1GB 的内存,据 tables()

报道

以下命令有效:

windindx<-wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]
wind[windindx]

将它们组合成:

wind[wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]]

结果 Error: could not find function "."

如果我对 data.table 进行子集化,那么它就可以工作,就像这样:

windsm<-wind[round(runif(10000000,0,20676204))]
windsm[windsm[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]]

这是我的 sessionInfo()

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8      
[8] LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] yaml_2.2.1         R.utils_2.10.1     R.oo_1.24.0        R.methodsS3_1.8.1  nanotime_0.3.3     xts_0.12.1         zoo_1.8-9          bit64_4.0.5        bit_4.0.4         
[10] glue_1.4.2         magrittr_2.0.1     future_1.21.0      lubridate_1.7.10   data.table_1.14.0  ggplot2_3.3.5      DALEX_2.3.0        mlr3tuning_0.8.0   paradox_0.7.1     
[19] mlr3viz_0.5.5      mlr3learners_0.4.5 mlr3_0.12.0        RPostgres_1.3.3   

loaded via a namespace (and not attached):
  [1] tidyselect_1.1.1     xfun_0.25            purrr_0.3.4          listenv_0.8.0        lattice_0.20-44      colorspace_2.0-2     vctrs_0.3.8          generics_0.1.0      
[9] htmltools_0.5.1.1    bbotk_0.3.2          utf8_1.2.2           blob_1.2.2           rlang_0.4.11         pillar_1.6.2         withr_2.4.2          DBI_1.1.1           
[17] palmerpenguins_0.1.0 uuid_0.1-4           lifecycle_1.0.0      munsell_0.5.0        gtable_0.3.0         codetools_0.2-18     evaluate_0.14        knitr_1.33          
[25] parallel_4.1.1       fansi_0.5.0          Rcpp_1.0.7           scales_1.1.1         backports_1.2.1      checkmate_2.0.0      RcppCCTZ_0.2.9       parallelly_1.27.0   
[33] hms_1.1.0            digest_0.6.27        dplyr_1.0.7          grid_4.1.1           tools_4.1.1          tibble_3.1.3         mlr3misc_0.9.3       crayon_1.4.1        
[41] pkgconfig_2.0.3      ellipsis_0.3.2       rmarkdown_2.10       lgr_0.4.2            R6_2.5.0             globals_0.14.0       compiler_4.1.1   

关于(相对)大的 data.table 是否有什么东西阻止它工作?我使用的机器是云上的 64GB VM。 htop 仅报告使用了大约 3.5GB 的内存,因此仍有大约 60GB 的可用内存。我的工作并不太繁重,所以我对答案比任何事情都更好奇。

编辑:为了赏金,我想知道为什么有时只需要 eval

您可以使用 eval 强制计算基础环境中的 i 参数:

wind[eval(wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1])]

    pricedate hour    type    zone               as_of  wind  age daysold
1: 2018-10-04    1 cop_hsl coastal 2018-09-29 00:43:24 712.0 4day       4
2: 2018-10-04    1   stwpf coastal 2018-09-29 00:43:24 751.5 4day       4
3: 2018-10-04    1   wgrpp coastal 2018-09-29 00:43:24 548.2 4day       4
4: 2018-10-04    1 cop_hsl   north 2018-09-29 00:43:24 843.0 4day       4
5: 2018-10-04    1   stwpf   north 2018-09-29 00:43:24 846.0 4day       4

help('data.table)` 表示:

Advanced: When i is a single variable name, it is not considered an expression of column names and is instead evaluated in calling scope.

这就是为什么 i 是单个变量 windindx 的第一个解决方案有效,但在错误范围内评估的组合无效的原因。

详细说明@Waldi 的回答:

> wind[browser()]
Called from: eval(.massagei(isub), x, ienv)
Browse[1]> wind
[1] 712.0 751.5 548.2 843.0 846.0
Browse[1]> wind[,.I[as_of==max(as_of)], by=.(pricedate, hour)][,V1]
Error in .(pricedate, hour) : could not find function "."
Browse[2]> ls()
[1] "age"       "as_of"     "daysold"   "hour"      "pricedate" "type"      "wind"      "zone"     

对于语法x[i],这是我们在i不是单个符号时操作的环境。此环境包括 x 列,查找符号时首先在其中查找。

如果我们改为传递单个符号,如 wind[wind, on=.(hour)],那么它会在父环境中进行查找,而不需要在 [=14= 的列中计算 i ] 环境。

我认为@Waldi 的回答中引用的文档足以告诉我们如何避免这个问题,但就其价值而言,这似乎是代码的 the relevant part

    else if (!is.name(isub)) {
      ienv = new.env(parent=parent.frame())
      if (getOption("datatable.optimize")>=1L) assign("order", forder, ienv)
      i = tryCatch(eval(.massagei(isub), x, ienv), error=function(e) {
        if (grepl(":=.*defined for use in j.*only", e$message))
          stopf("Operator := detected in i, the first argument inside DT[...], but is only valid in the second argument, j. Most often, this happens when forgetting the first comma (e.g. DT[newvar := 5] instead of DT[ , new_var := 5]). Please double-check the syntax. Run traceback(), and debugger() to get a line number.")
        else
          .checkTypos(e, names_x)
      })
    } else {
      # isub is a single symbol name such as B in DT[B]
      i = try(eval(isub, parent.frame(), parent.frame()), silent=TRUE)
      ...
    }