R中的街道地址清理

Street adress cleaning in R

我要清理的街道地址如下,例如:

121/122/123 克莱德路 15 号

121-122 布劳沃德第 10 巷

我在 R 中使用正则表达式有点像:

sub((\s*[/ ]\d+\s*){1, }   

虽然,它几乎给了我想要的输出,但术语“10th”和“15th”正在转换为 "th",但事实并非如此。

正确的输出应该是:

(i) 121 15th clyde 路

(ii) 121 10th broward lane

数据:

str <- c("121/122/123 15th clyde road", "121-122 10th broward lane", 
     "357 / 122 /123 17th peter road", "121- 122- 123 -124 10th aremand road")

解决方案:

gsub("(/(\s)?\d+{1,}|-(\s)?\d+){1,}", "", str)
[1] "121 15th clyde road"    "121 10th broward lane"  "357   17th peter road"  "121  10th aremand road"

鉴于您的示例,您只需要删除一系列以 /- 开头并后跟数字的字符。您可以使用以下正则表达式 [-/]\d+.

轻松做到这一点

stringr:

library(stringr)

v <- c("121/122/123 15th clyde road", "121-122 10th broward lane")
str_remove_all(v, "[-/]\d+")
# [1] "121 15th clyde road"   "121 10th broward lane"

或使用gsub函数:

v <- c("121/122/123 15th clyde road", "121-122 10th broward lane")
gsub("[-/]\d+", "", v)
# [1] "121 15th clyde road"   "121 10th broward lane"