如何删除字符串中包含“.com”但具有link的所有内容？

Question

如何在如下例的文本中获得预期的输出？

x<-c("Commerce recommend erkanexample.com.tr. This site erkanexample.com. erkandeneme.com is widely. The company name is apple.commerce is coma. spread")
x<-gsub("(.com)\S+", "",x)
x
[1] "Commerce r erkanexample This site erkanexample erkandeneme.com is widely. The name is apple is"
expected
[1] "Commerce recommend This site. is widely. The company name is apple.commerce is coma. spread"
>

Answer 1

stringr 包提供基本字符串操作的函数：

library(stringr)
library(dplyr)

x %>% 
  str_split(" ") %>% 
  unlist() %>% 
  str_subset("\.com($|\.)",negate = TRUE) %>% 
  str_c(collapse = " ")

给出：

"Commerce recommend This site is widely. The company name is apple.commerce is coma. spread"

编辑后

x %>% 
  str_split(" ") %>% 
  unlist() %>%
  str_subset("\.com$", negate = TRUE) %>% 
  str_replace(".*\.com.*\.$", ".") %>%
  str_c(collapse = " ") %>%
  str_replace_all(" \.", "\.")

给出：

"Commerce recommend. This site. is widely. The company name is apple.commerce is coma. spread"

Answer 2

想法：按 space 拆分并检测哪个单词包含 .com 和 select 不包含它并加入它们

x<-c("Commerce recommend erkanexample.com.tr. This site erkanexample.com. erkandeneme.com is widely. The company name is apple.commerce is coma. spread")
split_str <- str_split(x," ",simplify =FALSE)[[1]]
paste(split_str[!grepl("[.]com", split_str)],collapse = " ")

给予

“商业推荐本站广泛。公司名称是coma.spread”

Answer 3

这是你想要的吗？

gsub("\s[a-z]+\.com(\.[a-z]+)?\b", "", x)
[1] "Commerce recommend. This site. is widely. The company name is apple.commerce is coma. spread"

这里，我们什么都不替换：

\s: 一个白色space字符
[a-z]+: 一个或多个小写字母
\.: 一期
com：字符串com
(\.[a-z]+)?: 一个可选的句点后跟一个或多个可选的小写字母
\b: 一个单词边界

如何删除字符串中包含“.com”但具有link的所有内容？

How to delete everything that contains ".com" in the string but has a link?

string

r

gsub