R中的街道地址清理
Street adress cleaning in R
我要清理的街道地址如下,例如:
121/122/123 克莱德路 15 号
121-122 布劳沃德第 10 巷
我在 R 中使用正则表达式有点像:
sub((\s*[/ ]\d+\s*){1, }
虽然,它几乎给了我想要的输出,但术语“10th”和“15th”正在转换为 "th",但事实并非如此。
正确的输出应该是:
(i) 121 15th clyde 路
(ii) 121 10th broward lane
数据:
str <- c("121/122/123 15th clyde road", "121-122 10th broward lane",
"357 / 122 /123 17th peter road", "121- 122- 123 -124 10th aremand road")
解决方案:
gsub("(/(\s)?\d+{1,}|-(\s)?\d+){1,}", "", str)
[1] "121 15th clyde road" "121 10th broward lane" "357 17th peter road" "121 10th aremand road"
鉴于您的示例,您只需要删除一系列以 /
或 -
开头并后跟数字的字符。您可以使用以下正则表达式 [-/]\d+
.
轻松做到这一点
与stringr
:
library(stringr)
v <- c("121/122/123 15th clyde road", "121-122 10th broward lane")
str_remove_all(v, "[-/]\d+")
# [1] "121 15th clyde road" "121 10th broward lane"
或使用gsub
函数:
v <- c("121/122/123 15th clyde road", "121-122 10th broward lane")
gsub("[-/]\d+", "", v)
# [1] "121 15th clyde road" "121 10th broward lane"
我要清理的街道地址如下,例如:
121/122/123 克莱德路 15 号
121-122 布劳沃德第 10 巷
我在 R 中使用正则表达式有点像:
sub((\s*[/ ]\d+\s*){1, }
虽然,它几乎给了我想要的输出,但术语“10th”和“15th”正在转换为 "th",但事实并非如此。
正确的输出应该是:
(i) 121 15th clyde 路
(ii) 121 10th broward lane
数据:
str <- c("121/122/123 15th clyde road", "121-122 10th broward lane",
"357 / 122 /123 17th peter road", "121- 122- 123 -124 10th aremand road")
解决方案:
gsub("(/(\s)?\d+{1,}|-(\s)?\d+){1,}", "", str)
[1] "121 15th clyde road" "121 10th broward lane" "357 17th peter road" "121 10th aremand road"
鉴于您的示例,您只需要删除一系列以 /
或 -
开头并后跟数字的字符。您可以使用以下正则表达式 [-/]\d+
.
与stringr
:
library(stringr)
v <- c("121/122/123 15th clyde road", "121-122 10th broward lane")
str_remove_all(v, "[-/]\d+")
# [1] "121 15th clyde road" "121 10th broward lane"
或使用gsub
函数:
v <- c("121/122/123 15th clyde road", "121-122 10th broward lane")
gsub("[-/]\d+", "", v)
# [1] "121 15th clyde road" "121 10th broward lane"