R 提取函数给出了一个关于正则表达式的奇怪错误

Question

我有一个包含一列的 excel 文件。这一列有一个数字和这个数字对应的状态。

State
01 Alabama
02 Alaska

等等。我想将此列提取为两列，一列包含数字，另一列包含州名。我尝试使用 tidyr 中的 extract():

df <- read_xlsx("States.xlsx")
df %>% tidyr::extract(States,c("A","B"),sep="(\d\d) ([a-zA-Z]+)")

但是，它吐出错误：

Error: `regex` should define 2 groups;  found.
Error: 1 components of `...` were not used.

We detected these problematic arguments:
* `sep`

Did you misspecify an argument?

知道我做错了什么吗？谢谢！

Answer 1

根据?extract

regex - a regular expression used to extract the desired values. There should be one group (defined by ()) for each element of into.

extract 中没有名为 sep 的参数。是regex。 sep 是 separate 中的参数，而不是 extract

中的参数

library(dplyr)
df %>% 
  tidyr::extract(States,c("A","B"),regex="(\d\d) ([a-zA-Z]+)")

R 提取函数给出了一个关于正则表达式的奇怪错误

R extract function gives a weird error about the regex

split

r

extract

dplyr

tidyr