如何删除字符串中包含“.com”但具有link的所有内容?
How to delete everything that contains ".com" in the string but has a link?
如何在如下例的文本中获得预期的输出?
x<-c("Commerce recommend erkanexample.com.tr. This site erkanexample.com. erkandeneme.com is widely. The company name is apple.commerce is coma. spread")
x<-gsub("(.com)\S+", "",x)
x
[1] "Commerce r erkanexample This site erkanexample erkandeneme.com is widely. The name is apple is"
expected
[1] "Commerce recommend This site. is widely. The company name is apple.commerce is coma. spread"
>
stringr
包提供基本字符串操作的函数:
library(stringr)
library(dplyr)
x %>%
str_split(" ") %>%
unlist() %>%
str_subset("\.com($|\.)",negate = TRUE) %>%
str_c(collapse = " ")
给出:
"Commerce recommend This site is widely. The company name is apple.commerce is coma. spread"
编辑后
x %>%
str_split(" ") %>%
unlist() %>%
str_subset("\.com$", negate = TRUE) %>%
str_replace(".*\.com.*\.$", ".") %>%
str_c(collapse = " ") %>%
str_replace_all(" \.", "\.")
给出:
"Commerce recommend. This site. is widely. The company name is apple.commerce is coma. spread"
想法:按 space 拆分并检测哪个单词包含 .com 和 select 不包含它并加入它们
x<-c("Commerce recommend erkanexample.com.tr. This site erkanexample.com. erkandeneme.com is widely. The company name is apple.commerce is coma. spread")
split_str <- str_split(x," ",simplify =FALSE)[[1]]
paste(split_str[!grepl("[.]com", split_str)],collapse = " ")
给予
“商业推荐本站广泛。公司名称是coma.spread”
这是你想要的吗?
gsub("\s[a-z]+\.com(\.[a-z]+)?\b", "", x)
[1] "Commerce recommend. This site. is widely. The company name is apple.commerce is coma. spread"
这里,我们什么都不替换:
\s
: 一个白色space字符
[a-z]+
: 一个或多个小写字母
\.
: 一期
com
:字符串com
(\.[a-z]+)?
: 一个可选的句点后跟一个或多个可选的小写字母
\b
: 一个单词边界
如何在如下例的文本中获得预期的输出?
x<-c("Commerce recommend erkanexample.com.tr. This site erkanexample.com. erkandeneme.com is widely. The company name is apple.commerce is coma. spread")
x<-gsub("(.com)\S+", "",x)
x
[1] "Commerce r erkanexample This site erkanexample erkandeneme.com is widely. The name is apple is"
expected
[1] "Commerce recommend This site. is widely. The company name is apple.commerce is coma. spread"
>
stringr
包提供基本字符串操作的函数:
library(stringr)
library(dplyr)
x %>%
str_split(" ") %>%
unlist() %>%
str_subset("\.com($|\.)",negate = TRUE) %>%
str_c(collapse = " ")
给出:
"Commerce recommend This site is widely. The company name is apple.commerce is coma. spread"
编辑后
x %>%
str_split(" ") %>%
unlist() %>%
str_subset("\.com$", negate = TRUE) %>%
str_replace(".*\.com.*\.$", ".") %>%
str_c(collapse = " ") %>%
str_replace_all(" \.", "\.")
给出:
"Commerce recommend. This site. is widely. The company name is apple.commerce is coma. spread"
想法:按 space 拆分并检测哪个单词包含 .com 和 select 不包含它并加入它们
x<-c("Commerce recommend erkanexample.com.tr. This site erkanexample.com. erkandeneme.com is widely. The company name is apple.commerce is coma. spread")
split_str <- str_split(x," ",simplify =FALSE)[[1]]
paste(split_str[!grepl("[.]com", split_str)],collapse = " ")
给予
“商业推荐本站广泛。公司名称是coma.spread”
这是你想要的吗?
gsub("\s[a-z]+\.com(\.[a-z]+)?\b", "", x)
[1] "Commerce recommend. This site. is widely. The company name is apple.commerce is coma. spread"
这里,我们什么都不替换:
\s
: 一个白色space字符[a-z]+
: 一个或多个小写字母\.
: 一期com
:字符串com
(\.[a-z]+)?
: 一个可选的句点后跟一个或多个可选的小写字母\b
: 一个单词边界