有人可以解释 multi 在 data.table 中执行更新时如何工作(使用 .EACH 和 mult)

Can someone explain how mult works in data.table when it performs update in joins (using .EACHI and mult)

我再次努力理解 mult 参数在执行连接时更新时是如何工作的。 我想做的是实现 lj.

中定义的左连接

出于性能原因,我想更新左侧 table

“重要”部分是当左 table 和右 table 有一个共同的列时,(不考虑连接列),我想使用右边的第一个值 table 覆盖左边的值 table。 我以为 mult 会帮助我处理这个多重匹配问题,但我做不对

library(data.table)
X <- data.table(x = c("a", "a", "b", "c", "d"), y = c(0, 1, 1, 2, 2), t = 0:4)
X                 
#        x     y     t
#   <char> <num> <int>
#1:      a     0     0
#2:      a     1     1
#3:      b     1     2
#4:      c     2     3
#5:      d     2     4

Y <- data.table(xx = c("f", "b", "c", "c", "e", "a"), y = c(2, NA, 3, 4, 5, 6), u = 2:7)
Y                 
#       xx     y     u
#   <char> <num> <int>
#1:      f     2     2
#2:      b    NA     3
#3:      c     3     4
#4:      c     4     5
#5:      e     5     6
#6:      a     6     7

# Expected result
#        x     y     t                                                 
#   <char> <num> <int>                                                 
#1:      a     6     0    <= single match on xx == "a" so Y[xx == "a", y] is used                                                
#2:      a     6     1    <= single match on xx == "a" so Y[xx == "a", y] is used                                             
#3:      b    NA     2    <= single match on xx == "b" so Y[xx == "b", y] is used                                             
#4:      c     3     3    <= mult match on xx == "c" so Y[xx == "c", y[1L]] is used                                             
#5:      d     2     4    <= no xx == "d" in Y so nothing changes


copy(X)[Y, y := i.y, by = .EACHI, on = c(x = "xx"), mult = "first"][]
#        x     y     t                                                   
#   <char> <num> <int>                                                   
#1:      a     6     0                                                   
#2:      a     1     1   <= a should always have the same value ie 6                                                
#3:      b    NA     2                                                   
#4:      c     4     3   <= y == 4 is not the first value of y in the Y table                                                
#5:      d     2     4                                                   
    
# Using mult = "all" is the closest I get from the right result
copy(X)[Y, y := i.y, by = .EACHI, on = c(x = "xx"), mult = "all"][]
#        x     y     t                                                 
#   <char> <num> <int>                                                 
#1:      a     6     0                                                 
#2:      a     6     1                                                 
#3:      b    NA     2                                                 
#4:      c     4     3    <= y == 4 is not the first value of y in the Y table                                             
#5:      d     2     4  

有人可以向我解释一下上面的错误吗?

我想我可以使用 Y[X, ...] 来达到我想要的效果,问题是 X 非常大,使用 Y[X, ...]

我得到的性能要差得多

mult 始终等于“last”,以防更新加入 :=

我记得在文档的某处描述过它。

I'd like to use the first value in the right table to override the value of the left table

Select 第一个值并单独更新:

X[unique(Y, by="xx", fromLast=FALSE), on=.(x=xx), y := i.y]

   x  y t
1: a  6 0
2: a  6 1
3: b NA 2
4: c  3 3
5: d  2 4

fromLast= 可以 select 删除欺骗时的第一行或最后一行。


如何处理多个匹配项:

x[i, mult=] 中,如果 i 的一行有多个匹配项,mult 确定 x 的哪些匹配行被 selected .这解释了 OP 中显示的结果。

x[i, v := i.v]中,如果i的多行与x中的同一行匹配,则所有相关i行按顺序写入 x 行,因此最后的 i 行获得最终写入。打开详细输出以查看更新中进行了多少次编辑——在这种情况下它将超过 x 行数(因为行被重复编辑):

options(datatable.verbose=TRUE)
data.table(a=1,b=2)[.(a=1, b=3:4), on=.(a), b := i.b][]
# Assigning to 2 row subset of 1 rows
   a b
1: 1 4