R. Data.table。如何区分两个不同 data.table 的同名列?

R. Data.table. How to diff columns with the same names for two different data.table?

我有两个data.tables。

dt1 <- data.table("Symbol1" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20)) 
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")

我想通过引用为第二个 data.table 中的第一个(其中“Symbol”是关键)添加一个新列“Description”。 我尝试使用下一种方式:

dt1[, Description := dt2[.(Symbol1), Description]]

这很好用。我得到了结果

dt1
Symbol1 Volume        Description
1:  EURUSD      1     Euro vs Dollar
2:  USDCAD     10 Canadian vs Dollar
3:  EURUSD      2     Euro vs Dollar
4:  CADJPY     20               <NA>

但是如果这两个data.tables有相同的列名(Symbol)

dt1 <- data.table("Symbol" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20)) 
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
dt1[, Description := dt2[.(Symbol), Description]]

我遇到了错误:

Error in `[.data.table`(dt1, , `:=`(Description, dt2[.(Symbol), Description])) : 
  Supplied 2 items to be assigned to 4 items of column 'Description'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

有什么方法可以让它正常工作吗? 感谢您的帮助!

在您的第二次试验中,它失败了,因为 dt2[.(Symbol), Description] 中的 Symbol 来自 dt2,而不是来自 dt1。在您的第一次试验中,Symbol1 未在 dt2 中找到,因此 R 将尝试在 dt1 中搜索 Symbol1。 (如果这也失败了,R 将尝试在全局环境中搜索 Symbol1。因此,这里的顺序很重要。)

我会这样做:

dt1 <- data.table("Symbol1" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20)) 
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")

dt1[dt2, Description := i.Description, on = .(Symbol1 = Symbol)]

dt1 <- data.table("Symbol" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20)) 
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")

dt1[dt2, Description := i.Description, on = .(Symbol)]

另一个解决方法是使用 get 并明确告诉它在哪里搜索 Symbol。例如:

dt1[Symbol != "EURUSD", Description := dt2[.(get('Symbol', env = sys.parent(3))), Description]]
dt1
#    Symbol Volume        Description
# 1: EURUSD      1               <NA>
# 2: USDCAD     10 Canadian vs Dollar
# 3: EURUSD      2               <NA>
# 4: CADJPY     20               <NA>

您可以使用 merge :

library(data.table)

merge.data.table(dt1, dt2, by = 'Symbol', all.x = TRUE)

#   Symbol Volume        Description
#1: CADJPY     20               <NA>
#2: EURUSD      1     Euro vs Dollar
#3: EURUSD      2     Euro vs Dollar
#4: USDCAD     10 Canadian vs Dollar