使用R从文本中提取子字符串
Extraction of sub string from the text using R
我有一个字符串数据如下:
a<- "\n Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n Uploaded on May 3, 2020 at 10:56 in Research\n View Forum\n \n"
为此我必须提取字符串 "Social Media Learning and behaviour" 我使用了以下代码:
gsub(" Uploaded on .* ", "", gsub("\n Update Your Profile to Dissolve This Message\n", "",a))
这给我的输出如下
"Social Media Learning and behaviour\n\n"
我无法匹配准确的模式。在没有“\n\n”
的情况下提取 "Social Media Learning and behaviour" 的确切模式是什么
您可以提取 "Update Your Profile to Dissolve This Message"
和 "Uploaded on"
之间的部分
sub(".*Update Your Profile to Dissolve This Message\n(.*)\n\s+Uploaded on.*", "\1", a)
#[1] "Social Media Learning and behaviour"
您还可以使用 stringr
中的 str_match
stringr::str_match(a, "Update Your Profile to Dissolve This Message\n(.*)\n\s+Uploaded on")[, 2]
您可以捕获组中的上一行并匹配包含已上传的下一行:
(.*)\r?\n[^\S\r\n]+Uploaded on
a<- "\n Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n Uploaded on May 3, 2020 at 10:56 in Research\n View Forum\n \n"
stringr::str_match(a, "(.*)\r?\n[^\S\r\n]+Uploaded on")
我有一个字符串数据如下:
a<- "\n Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n Uploaded on May 3, 2020 at 10:56 in Research\n View Forum\n \n"
为此我必须提取字符串 "Social Media Learning and behaviour" 我使用了以下代码:
gsub(" Uploaded on .* ", "", gsub("\n Update Your Profile to Dissolve This Message\n", "",a))
这给我的输出如下
"Social Media Learning and behaviour\n\n"
我无法匹配准确的模式。在没有“\n\n”
的情况下提取 "Social Media Learning and behaviour" 的确切模式是什么您可以提取 "Update Your Profile to Dissolve This Message"
和 "Uploaded on"
sub(".*Update Your Profile to Dissolve This Message\n(.*)\n\s+Uploaded on.*", "\1", a)
#[1] "Social Media Learning and behaviour"
您还可以使用 stringr
str_match
stringr::str_match(a, "Update Your Profile to Dissolve This Message\n(.*)\n\s+Uploaded on")[, 2]
您可以捕获组中的上一行并匹配包含已上传的下一行:
(.*)\r?\n[^\S\r\n]+Uploaded on
a<- "\n Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n Uploaded on May 3, 2020 at 10:56 in Research\n View Forum\n \n"
stringr::str_match(a, "(.*)\r?\n[^\S\r\n]+Uploaded on")