匹配数字或字符前的第一个字母 - 正则表达式

Question

我正在尝试在 R 中匹配特定模式以分隔成列

考虑以下字符串示例：

1-EXAMPLE
23-EXAMPLE2
A-EXAMPLE3
EXAMPLE-4

如何编写要在 tidyr::extract 中使用的正则表达式，以便按如下方式进行分隔：

1   EXAMPLE
23  EXAMPLE2
A   EXAMPLE3
NA  EXAMPLE-4

我想在第一个 - 标记处进行分隔，如果它之前只有数字，或者如果前面只有一个字母（如第三种情况），但如果有更多（如示例 4)

谢谢！

Answer 1

我们可以在extract

之前使用case_when插入一个字符

library(dplyr)
library(stringr)
library(tidyr)
df1 %>% 
    mutate(col1 = case_when(str_detect(trimws(col1), '^([A-Z]|[0-9]+)\s*-', 
       negate = TRUE) ~ str_c('-', col1), TRUE ~ trimws(col1))) %>% 
    extract(col1, into = c('col1', 'col2'), '^([A-Z]|\d+)?\s*-(.*)') %>% 
    mutate(col1 = na_if(col1, ''))

-输出

col1      col2
1    1   EXAMPLE
2   23  EXAMPLE2
3    A  EXAMPLE3
4 <NA> EXAMPLE-4

数据

df1 <- structure(list(col1 = c("1-EXAMPLE", "23-EXAMPLE2", "A-EXAMPLE3", 
"EXAMPLE-4")), class = "data.frame", row.names = c(NA, -4L))

匹配数字或字符前的第一个字母 - 正则表达式

Match numbers or first letter before a character - regex

regex

r

tidyr

数据