R 中的字符串匹配类似于 SAS 中的 PRXMATCH()
String Matching in R like PRXMATCH() in SAS
寻求您的帮助以了解如何匹配 R
中的字符串,就像 PRXMATCH()
在 SAS
中所做的那样。
List1 <-c("lead","good")
List2 <-c("Quality","understand")
Name <-c("grp1","grp2")
我有一个包含列 sentence
的数据框。对于每个句子,我需要:
- 在
List1
中查找单词
- 如果找到单词,则查找
List2
的对应单词。
- 如果在与来自
List1
的单词的 +-5 个单词距离处找到单词,则应将来自 Name
的名称添加到 result
列。
例如"lead"
在所有句子中搜索。当找到 "lead"
时,则在该句子中,如果在 +-5 个单词距离处找到 "Quality"
,则应在 result
列中添加 "grp1"
,否则丢弃。
也许是这样的?
myData <- data.frame(sentence = c("The quality bla bla bla lead bla",
"The quality bla bla bla bla bla lead bla",
"The lead quality bla bla",
"The lead bla bla quality",
"The lead bla bla bla bla bla quality of",
"It allows us to understand how good bla",
"It is good to understand that bla",
"It is also good bla bla bla if we understand",
"lead quality is good to understand"),
Result = "",
stringsAsFactors = FALSE)
List1 <-c("lead","good")
List2 <-c("quality","understand")
Name <-c("grp1","grp2")
regexpr <- paste0("(\b",List1,"\s+(\w+\s+){0,4}",List2,"\b)|(\b",List2,"\s+(\w+\s+){0,4}",List1,"\b)")
for(i in 1:length(regexpr)) {
myData$Result <- ifelse(grepl(pattern = regexpr[i], x = myData$sentence),
yes = paste(myData$Result, Name[i]),
no = myData$Result)
}
结果
> myData
sentence Result
1 The quality bla bla bla lead bla grp1
2 The quality bla bla bla bla bla lead bla
3 The lead quality bla bla grp1
4 The lead bla bla quality grp1
5 The lead bla bla bla bla bla quality of
6 It allows us to understand how good bla grp2
7 It is good to understand that bla grp2
8 It is also good bla bla bla if we understand
9 lead quality is good to understand grp1 grp2
寻求您的帮助以了解如何匹配 R
中的字符串,就像 PRXMATCH()
在 SAS
中所做的那样。
List1 <-c("lead","good")
List2 <-c("Quality","understand")
Name <-c("grp1","grp2")
我有一个包含列 sentence
的数据框。对于每个句子,我需要:
- 在
List1
中查找单词
- 如果找到单词,则查找
List2
的对应单词。 - 如果在与来自
List1
的单词的 +-5 个单词距离处找到单词,则应将来自Name
的名称添加到result
列。
例如"lead"
在所有句子中搜索。当找到 "lead"
时,则在该句子中,如果在 +-5 个单词距离处找到 "Quality"
,则应在 result
列中添加 "grp1"
,否则丢弃。
也许是这样的?
myData <- data.frame(sentence = c("The quality bla bla bla lead bla",
"The quality bla bla bla bla bla lead bla",
"The lead quality bla bla",
"The lead bla bla quality",
"The lead bla bla bla bla bla quality of",
"It allows us to understand how good bla",
"It is good to understand that bla",
"It is also good bla bla bla if we understand",
"lead quality is good to understand"),
Result = "",
stringsAsFactors = FALSE)
List1 <-c("lead","good")
List2 <-c("quality","understand")
Name <-c("grp1","grp2")
regexpr <- paste0("(\b",List1,"\s+(\w+\s+){0,4}",List2,"\b)|(\b",List2,"\s+(\w+\s+){0,4}",List1,"\b)")
for(i in 1:length(regexpr)) {
myData$Result <- ifelse(grepl(pattern = regexpr[i], x = myData$sentence),
yes = paste(myData$Result, Name[i]),
no = myData$Result)
}
结果
> myData
sentence Result
1 The quality bla bla bla lead bla grp1
2 The quality bla bla bla bla bla lead bla
3 The lead quality bla bla grp1
4 The lead bla bla quality grp1
5 The lead bla bla bla bla bla quality of
6 It allows us to understand how good bla grp2
7 It is good to understand that bla grp2
8 It is also good bla bla bla if we understand
9 lead quality is good to understand grp1 grp2