R 正则表达式：第一个和最后一个 space 之间的匹配字符串

Question

我有一个 R 数据框，其中包含具有以下类型字符串的列：

DBR 0 1/2 02/15/25
FRTR 3 04/25/22
BTPS 1.35 04/15/22

我想使用正则表达式来匹配第一个 space 和最后一个 space 之间的字符串部分。

因此输出将是：

0 1/2
3
1.35

一些背景信息：

这些是债券描述。第一行的第一段是国家密钥（DBR = 德国）。最后一部分是指到期日（第一只债券为 2025 年 2 月 15 日）。

在国家密钥和到期日之间，债券的息票使用了几种不同的约定。例如，德国债券的息票率为 0.5%，第二个（法国）债券的息票率为 3%，最后一个（意大利）债券的息票率为 1.35%。

我已经想出如何通过

来匹配国家代码和到期日

^[^\s]+ (for the country key)
[^\s]+$ (for the maturity date)

那我想在比赛结束后将优惠券处理成统一的格式，以便进一步计算。

0 1/2 > 0,5
3 > 3.0
1.35 > 1.35

优惠券的混合格式也是我想只在第一张和最后一张之间提取的原因 space。例如，第一张债券在息票中有额外的 space。

谢谢。

Answer 1

sub(".*?\s+(.*)\s.*", "\1", Strings, perl=TRUE)
[1] "0 1/2" "3"     "1.35"

一点细节：

.\*?   matches anything but stops at the first match of what follows
\s+   matches one or more blank spaces
(.\*)  matches any number of characters, because it is in parentheses
       it becomes a capture group and is stored in the variable 
\s    waits for another blank, this time, the last one
.*     matches anything after the last blank

Answer 2

这是 base R 中的完整 walk-through：

df <- data.frame(junk = c("DBR 0 1/2 02/15/25", "FRTR 3 04/25/22", "BTPS 1.35 04/15/22"), stringsAsFactors = FALSE)
df$coupon <- sapply(df$junk, function (item) {
  frac <- sub(".*?([\d./]+)$", "\1", sub(".*?\s+(.*)\s.*", "\1", item, perl=TRUE), perl = TRUE)
  eval(parse(text = frac))
})
df

这产生

                junk coupon
1 DBR 0 1/2 02/15/25   0.50
2    FRTR 3 04/25/22   3.00
3 BTPS 1.35 04/15/22   1.35

想法是应用两个正则表达式并eval()计算结果。

或者 - 使用 dplyr 和一些错误处理：

library(dplyr)

df <- data_frame(junk = c("DBR 0 1/2 02/15/25", 
                          "FRTR 3 04/25/22", 
                          "BTPS 1.35 04/15/22",
                          "someweirdojunk"))

make_coupon <- function(col) {
  result <- sapply(col, function (item) {
    tryCatch({
      frac <- sub(".*?([\d./]+)$", "\1", sub(".*?\s+(.*)\s.*", "\1", item))
      eval(parse(text = frac))
    }, error = function(e) {
      NA
    })
  })
  return(result)
}

df %>%
  mutate(coupon = make_coupon(junk))

这会生成：

# A tibble: 4 x 2
  junk               coupon
  <chr>               <dbl>
1 DBR 0 1/2 02/15/25  0.500
2 FRTR 3 04/25/22     3.00 
3 BTPS 1.35 04/15/22  1.35 
4 someweirdojunk      NA

R 正则表达式：第一个和最后一个 space 之间的匹配字符串

R Regex: Match String between first and last space

regex

string

finance

r