DataFrames.jl 中“...”之前的 "r" 的目的是什么

What is the purpose of "r" before "..." in DataFrames.jl

我注意到人们使用 r".."。它是做什么用的?谢谢

r"..." 是用于定义 regular expression 的 Julia 语法,并且在需要正则表达式时在整个语言中使用(不仅在数据帧中)。您可以通过在 Julia REPL 的内置帮助中搜索 r"" 来找到有关此语法的更多信息:

help?> r""
  @r_str -> Regex

  Construct a regex, such as r"^[a-z]*$", without interpolation and unescaping (except
  for quotation mark " which still has to be escaped). The regex also accepts one or
  more flags, listed after the ending quote, to change its behaviour:

    •  i enables case-insensitive matching

    •  m treats the ^ and $ tokens as matching the start and end of individual
       lines, as opposed to the whole string.

    •  s allows the . modifier to match newlines.

    •  x enables "comment mode": whitespace is enabled except when escaped with \,
       and # is treated as starting a comment.

    •  a disables UCP mode (enables ASCII mode). By default \B, \b, \D, \d, \S, \s,
       \W, \w, etc. match based on Unicode character properties. With this option,
       these sequences only match ASCII characters.

  See Regex if interpolation is needed.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> match(r"a+.*b+.*?d$"ism, "Goodbye,\nOh, angry,\nBad world\n")
  RegexMatch("angry,\nBad world")

  This regex has the first three flags enabled.

更广泛地说,紧接在引文之前/与引文并列的某个单词或字母的模式称为 string macro (or non-standard string literal) and you can even define your own (as in packages like this)。 r"..." 语法恰好是内置的,专门用于定义 regexp 对象,这些对象以后可以应用于一个或多个具有 matchreplace 等函数的字符串.

@cbk 很好地概述了 Julia 中 r"..." 正则表达式的用法。

在DataFrames.jl中您可以使用常用的正则表达式作为列选择器。下面是一些示例,其中 r"b" 匹配名称中某处包含 "b" 的所有列:

julia> using DataFrames

julia> df = DataFrame(a=1, b1=2, b2=3, c=4)
1×4 DataFrame
 Row │ a      b1     b2     c
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      2      3      4

julia> df[:, r"b"] # data frame indexing
1×2 DataFrame
 Row │ b1     b2
     │ Int64  Int64
─────┼──────────────
   1 │     2      3

julia> select(df, r"b") # selection operation
1×2 DataFrame
 Row │ b1     b2
     │ Int64  Int64
─────┼──────────────
   1 │     2      3

julia> combine(df, AsTable(r"b") => ByRow(sum)) # rowwise aggregation of selected columns
1×1 DataFrame
 Row │ b1_b2_sum
     │ Int64
─────┼───────────
   1 │         5