提取特定行并在 R 中列出这些行

Question

我有一个文件，我想从中提取 segsites: 之后的数字，并制作一个带有 bin 的直方图。我写了一些代码来检查一行是否以单词 "segsites" 开头，然后提取该行并将其放入数据框中。

但是，它没有按预期进行。它提取了一些数字，但它们与我在文件中的值不对应。我已附上屏幕截图以显示文件的外观。这是一个示例，而不是实际文件。

library(dplyr)
library(ggplot2)

 txt <- readLines("file.msOut")

 lns <- (data.frame((beg=which(grepl("segsites:",txt)))))

  output <- cut(lns, breaks = seq(0,1000, by= 100), labels = c("<100","100-200","200-300","300-400","400-500",
                                                         "600-700","700-800,800-900","900-100"))

table(output) %>% 
  as.data.frame() %>% 
  ggplot(aes(x = output, y = Freq)) + 
  geom_col()

来自 txt 的样本数据

Answer 1

使用 regex 并假设 txt 包含来自图像的数据

txt <- c('segsites: 10','test')
as.numeric(gsub('\D', '', grep('segsites\:', txt, value = TRUE), perl = TRUE))
# [1] 10

提取特定行并在 R 中列出这些行

Extract specific lines and make a list of those in R

r

file-read