如何从文本中识别位置
how to identify locations from text
这是我获取代码的函数示例
df= read.csv("secondary.csv",header = TRUE)
S <- "s / O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
我建议制作所有可能的 N-x 字符串,其中 N 是字符串的长度,x 是可变长度
allchr <- unlist(strsplit(S, ""))
listsubstr <- sapply(1:length(allchr), function(I) paste0(allchr[I:length(allchr)], collapse=""))
# [1] "s / O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
# [2] " / O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
# [3] "/ O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
# [4] " O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
您可以遍历此列表以检查有效的地理编码。我必须提供伪代码,因为我不确定如何检查字符串是否是有效的地理编码。
sapply(listsubstr, function(I) is.geocode(I)) # contains pseudocode
虽然你也可以用递归来做到这一点。
myfun <- function(x) {
if (x is gecode) { # contains pseudocode
return(x)
} else {
myfun(substr(x, 2, nchar(S)))
}
}
这是我获取代码的函数示例
df= read.csv("secondary.csv",header = TRUE)
S <- "s / O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
我建议制作所有可能的 N-x 字符串,其中 N 是字符串的长度,x 是可变长度
allchr <- unlist(strsplit(S, ""))
listsubstr <- sapply(1:length(allchr), function(I) paste0(allchr[I:length(allchr)], collapse=""))
# [1] "s / O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
# [2] " / O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
# [3] "/ O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
# [4] " O sk hungu 101 / 90 MODEL HOUSE TALAB GAGNI SHUKUL LUCKNOW UTTAR PRADESH LUCKNOW UTTAR PRADESH 226001"
您可以遍历此列表以检查有效的地理编码。我必须提供伪代码,因为我不确定如何检查字符串是否是有效的地理编码。
sapply(listsubstr, function(I) is.geocode(I)) # contains pseudocode
虽然你也可以用递归来做到这一点。
myfun <- function(x) {
if (x is gecode) { # contains pseudocode
return(x)
} else {
myfun(substr(x, 2, nchar(S)))
}
}