R：删除文本中的点但不删除标记小数点的点

Question

我是正则表达式的新手，所以请多多包涵。

我有这样的字符串：

txt1 <- 'a,b,a.b,a.,1,2,1.2,1.,.,11,222,11.222,11.'

假设它来自 .csv，每个单元格由“,”分隔。现在我想删除所有'。'除了那些标记小数点。最后，我想以这样的方式结束：

txt2 <- 'a,b,ab,a,1,2,1.2,1,,11,222,11.222,11'

我试过以下代码：

txt2 <- gsub(pattern = '[^a-z0-9,(\d\.\d)]', replacement = '', text = txt1)
txt2 <- gsub(pattern = '[^a-z0-9,|(\d\.\d)]', replacement = '', text = txt1)

但都无效，都返回

> print(txt2)
[1] "a,b,a.b,a.,1,2,1.2,1.,.,11,222,11.222,11."

知道如何更正我的代码吗？谢谢！

Answer 1

您可以使用负前瞻。匹配 \.(?!\d) 并替换为任何内容。

https://regex101.com/r/LNHYOY/1

Answer 2

关键是使用负后瞻?<!和负前瞻?!

> txt1 <- 'a,b,a.b,a.,1,2,1.2,1.,.,11,222,11.222,11.'
> txt2 <- gsub(pattern='((?<![0-9])\.)|(\.(?![0-9]))', replacement='', x=txt1, perl=TRUE)
> txt2
[1] "a,b,ab,a,1,2,1.2,1,,11,222,11.222,11"

此模式匹配句点 \. 后接非 0-9 字符或句点后跟非 0-9 字符。您必须为 R 设置 perl=TRUE 以识别后视和前视。

这将 trim 前导句点字符，因此“.2”将变为“2”。如果不需要，lookbehind 需要 (?<![0-9,]).

Answer 3

负前瞻（如@CAustin 所建议）似乎是最优雅和简洁的。

由于上述解决方案中的 none 为您提供了实际的 R 代码，因此它是：

txt2 <- gsub("\.(?!\d)", "", txt1, perl = TRUE)
[1] "a,b,ab,a,1,2,1.2,1,,11,222,11.222,11"

R：删除文本中的点但不删除标记小数点的点

R: Remove dots in text but not those marking decimal points

regex

string

r

gsub