如何提取 \t 和 \ 之间的文本元素

Question

我正在处理一些非常混乱的 txt 格式数据，并且有一个问题，我想将特定数据填充到变量中，但我很难提取它。一个简单的例子来说明问题。我想提取元代码：xxxx，其中 xxxx 是取决于元代码的任意随机数，在本例中为 1234。检查此 txt 文件时，我发现字符串在查看我加载的文件时包含以下内容：

"\"\t\tMETACODE:\t\t\t1234\""

现在我想提取 1234 或 xxxx，因为我有几个 txt 文件。我尝试根据另一个 Whosebug 示例使用 stringr，但它没有 \t 和 .我试过了

library(stringr)
metacode <- str_match(textfile, "METACODE:\t\t\t\s*(.*?)\")

其中 textfile 是包含我用 readlines 读入的文本的变量。我收到以下错误：

Error in stri_match_first_regex(string, pattern, opts_regex = opts(pattern)) : 
  Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE)

有什么处理 \t、\n、等的好主意吗？一个简单的例子将不胜感激。

Answer 1

匹配数字而不是搜索反斜杠：

library(stringr)
textfile <- "\"\t\tMETACODE:\t\t\t1234\""
metacode <- str_match(textfile, "METACODE:\t\t\t\s*(\d+)")
metacode[,2]

结果：[1] "1234"

见proof。

表达式解释

--------------------------------------------------------------------------------
  METACODE:                'METACODE:'
--------------------------------------------------------------------------------
  \t                       '\t' (tab)
--------------------------------------------------------------------------------
  \t                       '\t' (tab)
--------------------------------------------------------------------------------
  \t                       '\t' (tab)
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )                        end of

如何提取 \t 和 \ 之间的文本元素

How do I extract text elements between \t and \

regex

string

text

r

stringr