在 R 中向 data.table 添加多列

Adding multiple columns to data.table in R

我有一个 data.table,其中包含序列和读取次数,如下所示:

   sequence   num_reads
1: AACCTGCCG          1
2: CGCGCTCAA         12
3: AGTGTGAGC          3
4: TGGGTACAC         11
5: GGCCGCGTG         15
6: CCTTAAGAG          2
7: GCGGAACTG          9
8: GCGTTGTAG         17
9: GTTGTAGCG         20
10: ACACGTGAC        16

我想使用 data.table 添加两个新列到此 table,基于应用具有两个权重和两个 lambda 的 dpois() 的结果。正确的输出应该是这样的(基于使用 data.frame):

sequence     num_reads                       clus1                       clus2
1  AACCTGCCG         1 2.553269503552647000377e-03 1.610220613932057849571e-03
2  CGCGCTCAA        12 1.053993989051599418361e-02 2.887608256917401083896e-02
3  AGTGTGAGC         3 2.085170094567994833468e-02 1.717568654860860896672e-02
4  TGGGTACAC        11 1.806846838374168498498e-02 4.331412385376097462508e-02
5  GGCCGCGTG        15 1.324248858039188620275e-03 5.415587646672919558410e-03
6  CCTTAAGAG         2 8.936443262434262332916e-03 6.440882455728230530922e-03
7  GCGGAACTG         9 4.056186780023639942838e-02 7.444615037365168164207e-02
8  GCGTTGTAG        17 2.385595369261770803265e-04 1.274255916864215588610e-03
9  GTTGTAGCG        20 1.196285397159046524451e-05 9.538289904012846548518e-05
10 ACACGTGAC        16 5.793588753921446012421e-04 2.707793823336458478163e-03

但是当我尝试使用 data.table 时,我似乎无法获得正确的结果。这是我尝试过的方法(基于围绕该主题提出的类似问题):

    pois = function(n, p, l){return(dpois(as.numeric(as.character(n)), l)*p) }
    x = x[, c(paste("clus", seq(1,2), sep = '')) := pois(num_reads, c(0.4,0.6), c(7,8)), by = seq_len(nrow(x))]

结果如下:

          sequence num_reads                             clus1                      clus2
    1:   AACCTGCCG         1       2.553269503552647000377e-03 2.553269503552647000377e-03
    2:   CGCGCTCAA        12       1.053993989051599418361e-02 1.053993989051599418361e-02
    3:   AGTGTGAGC         3       2.085170094567994833468e-02 2.085170094567994833468e-02
    4:   TGGGTACAC        11       1.806846838374168498498e-02 1.806846838374168498498e-02
    5:   GGCCGCGTG        15       1.324248858039188620275e-03 1.324248858039188620275e-03
    6:   CCTTAAGAG         2       8.936443262434262332916e-03 8.936443262434262332916e-03
    7:   GCGGAACTG         9       4.056186780023639942838e-02 4.056186780023639942838e-02
    8:   GCGTTGTAG        17       2.385595369261770803265e-04 2.385595369261770803265e-04
    9:   GCGTTGTAG        20       1.196285397159046524451e-05 1.196285397159046524451e-05
    10:  ACACGTGAC        16       5.793588753921446012421e-04 5.793588753921446012421e-04

我使用 data.table 而不是 data.frame 的原因是我的真实数据有 100,000 行。我研究了 this and this 的答案,但未能想出解决方案。

如果您有任何提示,我们将不胜感激。谢谢!

我们可以试试

x[,  paste("clus", seq(1,2), sep = ''):= 
      as.list(pois(num_reads, c(0.4,0.6), c(7,8))), by = seq_len(nrow(x))]
x
#    sequence num_reads         clus1        clus2
#1: AACCTGCCG         1 0.00255326950 0.0016102206
#2: CGCGCTCAA        12 0.01053993989 0.0288760826
#3: AGTGTGAGC         3 0.02085170095 0.0171756865
#4: TGGGTACAC        11 0.01806846838 0.0433141239
#5: GGCCGCGTG        15 0.00132424886 0.0054155876
#6: CCTTAAGAG         2 0.00893644326 0.0064408825
#7: GCGGAACTG         9 0.04056186780 0.0744461504
#8: GCGTTGTAG        17 0.00023855954 0.0012742559
#9: GTTGTAGCG        20 0.00001196285 0.0000953829
#10:ACACGTGAC        16 0.00057935888 0.0027077938