R. Data.table。如何区分两个不同 data.table 的同名列?
R. Data.table. How to diff columns with the same names for two different data.table?
我有两个data.tables。
dt1 <- data.table("Symbol1" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20))
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
我想通过引用为第二个 data.table 中的第一个(其中“Symbol”是关键)添加一个新列“Description”。
我尝试使用下一种方式:
dt1[, Description := dt2[.(Symbol1), Description]]
这很好用。我得到了结果
dt1
Symbol1 Volume Description
1: EURUSD 1 Euro vs Dollar
2: USDCAD 10 Canadian vs Dollar
3: EURUSD 2 Euro vs Dollar
4: CADJPY 20 <NA>
但是如果这两个data.tables有相同的列名(Symbol)
dt1 <- data.table("Symbol" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20))
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
dt1[, Description := dt2[.(Symbol), Description]]
我遇到了错误:
Error in `[.data.table`(dt1, , `:=`(Description, dt2[.(Symbol), Description])) :
Supplied 2 items to be assigned to 4 items of column 'Description'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
有什么方法可以让它正常工作吗?
感谢您的帮助!
在您的第二次试验中,它失败了,因为 dt2[.(Symbol), Description]
中的 Symbol
来自 dt2
,而不是来自 dt1
。在您的第一次试验中,Symbol1
未在 dt2
中找到,因此 R 将尝试在 dt1
中搜索 Symbol1
。 (如果这也失败了,R 将尝试在全局环境中搜索 Symbol1
。因此,这里的顺序很重要。)
我会这样做:
dt1 <- data.table("Symbol1" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20))
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
dt1[dt2, Description := i.Description, on = .(Symbol1 = Symbol)]
dt1 <- data.table("Symbol" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20))
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
dt1[dt2, Description := i.Description, on = .(Symbol)]
另一个解决方法是使用 get 并明确告诉它在哪里搜索 Symbol
。例如:
dt1[Symbol != "EURUSD", Description := dt2[.(get('Symbol', env = sys.parent(3))), Description]]
dt1
# Symbol Volume Description
# 1: EURUSD 1 <NA>
# 2: USDCAD 10 Canadian vs Dollar
# 3: EURUSD 2 <NA>
# 4: CADJPY 20 <NA>
您可以使用 merge
:
library(data.table)
merge.data.table(dt1, dt2, by = 'Symbol', all.x = TRUE)
# Symbol Volume Description
#1: CADJPY 20 <NA>
#2: EURUSD 1 Euro vs Dollar
#3: EURUSD 2 Euro vs Dollar
#4: USDCAD 10 Canadian vs Dollar
我有两个data.tables。
dt1 <- data.table("Symbol1" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20))
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
我想通过引用为第二个 data.table 中的第一个(其中“Symbol”是关键)添加一个新列“Description”。 我尝试使用下一种方式:
dt1[, Description := dt2[.(Symbol1), Description]]
这很好用。我得到了结果
dt1
Symbol1 Volume Description
1: EURUSD 1 Euro vs Dollar
2: USDCAD 10 Canadian vs Dollar
3: EURUSD 2 Euro vs Dollar
4: CADJPY 20 <NA>
但是如果这两个data.tables有相同的列名(Symbol)
dt1 <- data.table("Symbol" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20))
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
dt1[, Description := dt2[.(Symbol), Description]]
我遇到了错误:
Error in `[.data.table`(dt1, , `:=`(Description, dt2[.(Symbol), Description])) :
Supplied 2 items to be assigned to 4 items of column 'Description'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
有什么方法可以让它正常工作吗? 感谢您的帮助!
在您的第二次试验中,它失败了,因为 dt2[.(Symbol), Description]
中的 Symbol
来自 dt2
,而不是来自 dt1
。在您的第一次试验中,Symbol1
未在 dt2
中找到,因此 R 将尝试在 dt1
中搜索 Symbol1
。 (如果这也失败了,R 将尝试在全局环境中搜索 Symbol1
。因此,这里的顺序很重要。)
我会这样做:
dt1 <- data.table("Symbol1" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20))
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
dt1[dt2, Description := i.Description, on = .(Symbol1 = Symbol)]
dt1 <- data.table("Symbol" = c("EURUSD", "USDCAD", "EURUSD", "CADJPY"), "Volume" = c(1, 10, 2, 20))
dt2 <- data.table("Symbol" = c("EURUSD", "USDCAD"), "Description" = c("Euro vs Dollar", "Canadian vs Dollar"), key = "Symbol")
dt1[dt2, Description := i.Description, on = .(Symbol)]
另一个解决方法是使用 get 并明确告诉它在哪里搜索 Symbol
。例如:
dt1[Symbol != "EURUSD", Description := dt2[.(get('Symbol', env = sys.parent(3))), Description]]
dt1
# Symbol Volume Description
# 1: EURUSD 1 <NA>
# 2: USDCAD 10 Canadian vs Dollar
# 3: EURUSD 2 <NA>
# 4: CADJPY 20 <NA>
您可以使用 merge
:
library(data.table)
merge.data.table(dt1, dt2, by = 'Symbol', all.x = TRUE)
# Symbol Volume Description
#1: CADJPY 20 <NA>
#2: EURUSD 1 Euro vs Dollar
#3: EURUSD 2 Euro vs Dollar
#4: USDCAD 10 Canadian vs Dollar