如何将段边界映射到 R 中参考文件中的最近位置

How to map the segment boundaries to the closest position in reference file in R

如何将段文件中的坐标位置(开始和结束位置)映射到参考文件中最近的位置。

seg <- Sample   Chromosome  Start        End        Num_markers LogRatio
        Nf1         1      3020000.5    195340000.5     4732    0.2981
        Nf2         2      3100000.5    181980000.5     4091    0.2986

Ref <- Name       Chromosome    Position
     1:3010000.5        1       3010000.5
     1:195330000.5      1     195330000.5
     2:3090000.5        2       3090000.5
     2:181970000.5      2     181970000.5

期望输出

result <- Sample  Chromosome    Start     End       Num_markers LogRatio
          Nf1       1         3010000.5 195330000.5  4732        0.2981
          Nf2       2         3090000.5 181970000.5  4091        0.2986

使用 data.table,您可以在指定 roll = "nearest" 的同时执行两个滚动连接。你需要这样做两次,因为你每次都需要加入不同的列,但这应该非常有效。这是一个可能的实现

library(data.table)
setDT(seg)
setDT(Ref)
StartInd <- Ref[seg, on = c(Chromosome = "Chromosome", Position = "Start"), which = TRUE, roll = "nearest"]
EndInd <- Ref[seg, on = c(Chromosome = "Chromosome", Position = "End"), which = TRUE, roll = "nearest"]
seg[, `:=`(Start = Ref[StartInd, Position], End =  Ref[EndInd, Position])]
print(seg, digits = 10)
#    Sample Chromosome     Start         End Num_markers LogRatio
# 1:    Nf1          1 3010000.5 195330000.5        4732   0.2981
# 2:    Nf2          2 3090000.5 181970000.5        4091   0.2986