在这个 gsub 示例中我做错了什么？

Question

I'm looking at this tutorial for using RegEx with stringr。使用以下示例：

str <- c("i.e., George W. Bush", "Lyndon B. Johnson, etc.")
gsub("([A-Z])[.]?", "\1", str)

教程告诉我输出将生成：

[1] "George W Bush"    "Lyndon B Johnson"

但后来我运行在 R 上使用了一个相同的脚本，结果是这样的：

str <- c("i.e., George W. Bush", "Lyndon B. Johnson, etc.")
gsub("([A-Z])[.]?", "\1", str)
[1] "i.e., George W Bush"    "Lyndon B Johnson, etc."

就是returns原文。即使我运行它在一个 Regex 测试站点上它仍然吐出同样的东西。

我是不是做错了什么（可能）？还是教程写错了（存疑）？我觉得我在这里吃了疯狂的药（已确认）。

Answer 1

看起来你做的是对的，实际上教程中有错误。我也测试了正则表达式，你可以看到 it here. What the regex you are given is capturing any uppercase letter that may or may not be followed by a dot. For instance, "W." in "George W. Bush" is substituted with "W", but "i.e." is not captured and substituted because none of the characters are capitalized. If we had "I.E." it would get substituted with "IE". In order to capture the names given we need a different regex. One approach might be to capture the first name, middle initial, and last name. Now you could get the effect with the regex .*([A-Z][a-z]+)\s([A-Z])[.]+\s([A-Z][a-z]+).* see here 或者在 R 中使用

str <- c("i.e., George W. Bush", "Lyndon B. Johnson, etc.")
gsub(".*([A-Z][a-z]+) ([A-Z])[.]+ ([A-Z][a-z]+).*", "\1 \2 \3", str)
#> [1] "George W Bush"    "Lyndon B Johnson"

但这可能不是清理某些名称的最有效方法。

在这个 gsub 示例中我做错了什么？

What am I doing wrong in this gsub example?

regex

r

stringr