如何使用 data.table 查找所有列组合的线性回归

Question

我正在尝试为每个可能的变量组合找到 iris 数据集组之间的线性回归。由于这是一个玩具示例，因此很容易分别对每个变量集进行线性回归并连接结果。但是，对于具有大量列的 data.table，很难找到所有组之间的线性回归。

library(data.table)
  dt = copy(iris)
  setDT(dt)[, .(model1 = lm(Sepal.Length ~ Petal.Width, .SD)$coeff[2], model2 = lm(Petal.Width ~ Sepal.Length, .SD)$coeff[2]), by = Species]
      Species    model1     model2
1:     setosa 0.9301727 0.08314444
2: versicolor 1.4263647 0.20935719
3:  virginica 0.6508306 0.12141646

  setDT(dt)[, .(model1 = lm(Sepal.Width ~ Petal.Length, .SD)$coeff[2], model2 = lm(Petal.Length ~ Sepal.Width, .SD)$coeff[2]), by = Species]
      Species    model1    model2
1:     setosa 0.3878739 0.0814112
2: versicolor 0.3743068 0.8393782
3:  virginica 0.2343482 0.6863153

  setDT(dt)[, .(model1 = lm(Sepal.Width ~ Sepal.Length, .SD)$coeff[2], model2 = lm(Sepal.Length ~ Sepal.Width, .SD)$coeff[2]), by = Species]
      Species    model1    model2
1:     setosa 0.7985283 0.6904897
2: versicolor 0.3197193 0.8650777
3:  virginica 0.2318905 0.9015345

  setDT(dt)[, .(model1 = lm(Petal.Width ~ Petal.Length, .SD)$coeff[2], model2 = lm(Petal.Length ~ Petal.Width, .SD)$coeff[2]), by = Species]
      Species    model1    model2
1:     setosa 0.2012451 0.5464903
2: versicolor 0.3310536 1.8693247
3:  virginica 0.1602970 0.6472593

与其分别对每组变量进行线性回归，不如使用 data.table 轻松完成？我想要的输出如下 -

      Species   Variable1   Variable2     model1     model2
       setosa Sepal.Length  Petal.Width   0.9301727 0.08314444
   versicolor Sepal.Length  Petal.Width   1.4263647 0.20935719
    virginica Sepal.Length  Petal.Width   0.6508306 0.12141646
       setosa Sepal.Width   Petal.Length  0.3878739 0.0814112
   versicolor Sepal.Width   Petal.Length  0.3743068 0.8393782
    virginica Sepal.Width   Petal.Length  0.2343482 0.6863153
       setosa Sepal.Width   Sepal.Length  0.7985283 0.6904897
   versicolor Sepal.Width   Sepal.Length  0.3197193 0.8650777
    virginica Sepal.Width   Sepal.Length  0.2318905 0.9015345
       setosa Petal.Width   Petal.Length  0.2012451 0.5464903
   versicolor Petal.Width   Petal.Length  0.3310536 1.8693247
    virginica Petal.Width   Petal.Length  0.1602970 0.6472593

Answer 1

我们可以使用 combn 创建一个 list 的公式，在 'iris' 的列名上 reformulate 除了 'Species'，然后 , 循环在数据中按 'Species' 分组的 list 上，应用 lm 并提取 coefficients

library(data.table)
lst1 <- combn(names(iris)[-5], 2, FUN = 
      function(x) reformulate(x[1], x[2]), simplify = FALSE)
dt = copy(iris)
out <- setDT(dt)[, lapply(lst1, function(fmla) 
        lm(fmla, .SD)$coeff), 
       by = Species]
setnames(out, -1, sapply(lst1, deparse))

-输出

out
      Species Sepal.Width ~ Sepal.Length Petal.Length ~ Sepal.Length Petal.Width ~ Sepal.Length Petal.Length ~ Sepal.Width Petal.Width ~ Sepal.Width
1:     setosa                 -0.5694327                   0.8030518                -0.17022108                  1.1829224                0.02417907
2:     setosa                  0.7985283                   0.1316317                 0.08314444                  0.0814112                0.06470856
3: versicolor                  0.8721460                   0.1851155                 0.08325571                  1.9349223                0.16690570
4: versicolor                  0.3197193                   0.6864698                 0.20935719                  0.8393782                0.41844560
5:  virginica                  1.4463054                   0.6104680                 1.22610837                  3.5108983                0.66405950
6:  virginica                  0.2318905                   0.7500808                 0.12141646                  0.6863153                0.45794906
   Petal.Width ~ Petal.Length
1:                -0.04822033
2:                 0.20124509
3:                -0.08428835
4:                 0.33105360
5:                 1.13603130

如何使用 data.table 查找所有列组合的线性回归

How to find linear regression on all combinations of columns using data.table

regression

r

data.table