用正则表达式分隔列

Question

我很难找到正确的正则表达式来将单列分成两列。

这是我的例子。

Col 1
8.3 algo y algo mas

我想要这个

Col 1    Col 2
8.3       algo y algo mas

我一直在尝试这段代码。

library(tidyverse)
    base <- base %>%
separate(col 1, into c("col 2", "col 3"), sep = "\s")

Answer 1

为了安全起见，我认为最好用易于识别的字符替换每个 space...

df[, 'Col 1'] <- gsub(pattern = '^([0-9\.]+) ', replacement = '\1_', x = df[, 'Col 1'])

然后我会使用 separate:

df <- separate(data = df, col = 'Col 1', into = c('Col 1', 'Col 2'), sep = '_')

我也会更改列名，因为 space 在列名中通常是个问题...尝试更改为 col_1.

Answer 2

您可以尝试 stringr 和 rebus 中的功能：

df <- data.frame(Col_1 = "8.3 algo y algo mas")

library(stringr)
library(rebus)
str_match(df$Col_1, pattern = capture(DGT %R% DOT %R% DGT) %R%
                              SPC %R%
                              capture(one_or_more(or(SPC, LOWER))))

rebus 包允许您使用人类可读代码逐段构建正则表达式。输出结果如下：

#      [,1]                  [,2]  [,3]             
# [1,] "8.3 algo y algo mas" "8.3" "algo y algo mas"

用正则表达式分隔列

Separate columns with regular expressions

regex

database

r

dataframe

tidyverse