在 data.table 中：遍历另一个 data.table 的行

Question

我有两个data.tables：

k1 <- mtcars[1:4,1:6]
k11 <- as.data.table(k1)
k2 <- iris[1:3,1:2]
k22 <- as.data.table(k2)

我正在尝试通过遍历第二个 data.table

的列行来对第一个 data.table 执行一些列操作

k3 <- lapply(1:nrow(k2),function(j){
  mpg=k1[,"mpg"]*k2[j,"Sepal.Width"] #get the new value of mpg equals to mpg*first row of second column of second data.frame
  cyl=k1[,"cyl"]*k2[j,"Sepal.Width"]
  a3=k1[,3:6] #all remaining columns over which no operations are done 
  a4<-cbind(mpg,cyl,a3) #cbind these and create a new dataframe for each row of second dataframe. There are three rows and hence there will be three final dataset

})
#rbind all these dataset and get the new dataset
k4<-do.call(rbind,k3)

head(k4)
                  mpg  cyl disp  hp drat    wt
Mazda RX4       73.50 30.6  160 110 3.90 2.620
Mazda RX4 Wag   73.50 30.6  160 110 3.90 2.875
Datsun 710      79.80 20.4  108  93 3.85 2.320
Hornet 4 Drive  74.90 30.6  258 110 3.08 3.215

虽然上述解决方案完美无缺，但我想知道：

使用data.table是否有效率增益（因为这里没有group_by操作）第一个数据帧30000乘以10和#second数据帧60乘以4（最终数据集将有 30000 倍 60：180 万行）。
如果有效率增益，如何使用 data.table:

以下是我的解决方案（类似于data.frame）

k5<-rbindlist(lapply(1:nrow(k2),function(j){

  k11[,`:=`(mpg=mpg*k22[j,Sepal.Width],cyl=cyl*k22[j,Sepal.Length])]

}))

 head(k5)
      mpg     cyl disp  hp drat    wt
1: 705.60 704.718  160 110 3.90 2.620
2: 705.60 704.718  160 110 3.90 2.875
3: 766.08 469.812  108  93 3.85 2.320
4: 719.04 704.718  258 110 3.08 3.215
5: 705.60 704.718  160 110 3.90 2.620
6: 705.60 704.718  160 110 3.90 2.875

因为，你可以看到答案是不同的（我猜是因为 data.table 的复制性质）。

Answer 1

您可以使用 set（这会很有效，因为可以避免 [.data.table 中的开销）并在使数据集具有相同行数后执行 *，

library(data.table)
k1N <- k11[rep(1:.N,nrow(k22))]
k2N <- k22[rep(1:.N,each=nrow(k11))][, 2:1]

for(j in 1:2){
 set(k1N, i=NULL, j=j, value=k1N[[j]]*k2N[[j]])
}

k1N
#      mpg  cyl disp  hp drat    wt
# 1: 73.50 30.6  160 110 3.90 2.620
# 2: 73.50 30.6  160 110 3.90 2.875
# 3: 79.80 20.4  108  93 3.85 2.320
# 4: 74.90 30.6  258 110 3.08 3.215
# 5: 63.00 29.4  160 110 3.90 2.620
# 6: 63.00 29.4  160 110 3.90 2.875
# 7: 68.40 19.6  108  93 3.85 2.320
# 8: 64.20 29.4  258 110 3.08 3.215
# 9: 67.20 28.2  160 110 3.90 2.620
#10: 67.20 28.2  160 110 3.90 2.875
#11: 72.96 18.8  108  93 3.85 2.320
#12: 68.48 28.2  258 110 3.08 3.215

在 data.table 中：遍历另一个 data.table 的行

In data.table: iterating over the rows of another data.table

r

data.table