如何将段边界映射到 R 中参考文件中的最近位置
How to map the segment boundaries to the closest position in reference file in R
如何将段文件中的坐标位置(开始和结束位置)映射到参考文件中最近的位置。
seg <- Sample Chromosome Start End Num_markers LogRatio
Nf1 1 3020000.5 195340000.5 4732 0.2981
Nf2 2 3100000.5 181980000.5 4091 0.2986
Ref <- Name Chromosome Position
1:3010000.5 1 3010000.5
1:195330000.5 1 195330000.5
2:3090000.5 2 3090000.5
2:181970000.5 2 181970000.5
期望输出
result <- Sample Chromosome Start End Num_markers LogRatio
Nf1 1 3010000.5 195330000.5 4732 0.2981
Nf2 2 3090000.5 181970000.5 4091 0.2986
使用 data.table
,您可以在指定 roll = "nearest"
的同时执行两个滚动连接。你需要这样做两次,因为你每次都需要加入不同的列,但这应该非常有效。这是一个可能的实现
library(data.table)
setDT(seg)
setDT(Ref)
StartInd <- Ref[seg, on = c(Chromosome = "Chromosome", Position = "Start"), which = TRUE, roll = "nearest"]
EndInd <- Ref[seg, on = c(Chromosome = "Chromosome", Position = "End"), which = TRUE, roll = "nearest"]
seg[, `:=`(Start = Ref[StartInd, Position], End = Ref[EndInd, Position])]
print(seg, digits = 10)
# Sample Chromosome Start End Num_markers LogRatio
# 1: Nf1 1 3010000.5 195330000.5 4732 0.2981
# 2: Nf2 2 3090000.5 181970000.5 4091 0.2986
如何将段文件中的坐标位置(开始和结束位置)映射到参考文件中最近的位置。
seg <- Sample Chromosome Start End Num_markers LogRatio
Nf1 1 3020000.5 195340000.5 4732 0.2981
Nf2 2 3100000.5 181980000.5 4091 0.2986
Ref <- Name Chromosome Position
1:3010000.5 1 3010000.5
1:195330000.5 1 195330000.5
2:3090000.5 2 3090000.5
2:181970000.5 2 181970000.5
期望输出
result <- Sample Chromosome Start End Num_markers LogRatio
Nf1 1 3010000.5 195330000.5 4732 0.2981
Nf2 2 3090000.5 181970000.5 4091 0.2986
使用 data.table
,您可以在指定 roll = "nearest"
的同时执行两个滚动连接。你需要这样做两次,因为你每次都需要加入不同的列,但这应该非常有效。这是一个可能的实现
library(data.table)
setDT(seg)
setDT(Ref)
StartInd <- Ref[seg, on = c(Chromosome = "Chromosome", Position = "Start"), which = TRUE, roll = "nearest"]
EndInd <- Ref[seg, on = c(Chromosome = "Chromosome", Position = "End"), which = TRUE, roll = "nearest"]
seg[, `:=`(Start = Ref[StartInd, Position], End = Ref[EndInd, Position])]
print(seg, digits = 10)
# Sample Chromosome Start End Num_markers LogRatio
# 1: Nf1 1 3010000.5 195330000.5 4732 0.2981
# 2: Nf2 2 3090000.5 181970000.5 4091 0.2986