使用R从文本中提取子字符串

Extraction of sub string from the text using R

我有一个字符串数据如下:

a<-  "\n    Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n        Uploaded on May 3, 2020 at 10:56 in Research\n            View Forum\n        \n"

为此我必须提取字符串 "Social Media Learning and behaviour" 我使用了以下代码:

gsub("        Uploaded on .* ", "", gsub("\n    Update Your Profile to Dissolve This Message\n", "",a)) 

这给我的输出如下

"Social Media Learning and behaviour\n\n"

我无法匹配准确的模式。在没有“\n\n”

的情况下提取 "Social Media Learning and behaviour" 的确切模式是什么

您可以提取 "Update Your Profile to Dissolve This Message""Uploaded on"

之间的部分
sub(".*Update Your Profile to Dissolve This Message\n(.*)\n\s+Uploaded on.*", "\1", a)
#[1] "Social Media Learning and behaviour"

您还可以使用 stringr

中的 str_match
stringr::str_match(a, "Update Your Profile to Dissolve This Message\n(.*)\n\s+Uploaded on")[, 2]

您可以捕获组中的上一行并匹配包含已上传的下一行:

(.*)\r?\n[^\S\r\n]+Uploaded on

Regex demo

a<-  "\n    Update Your Profile to Dissolve This Message\nSocial Media Learning and behaviour\n        Uploaded on May 3, 2020 at 10:56 in Research\n            View Forum\n        \n"
stringr::str_match(a, "(.*)\r?\n[^\S\r\n]+Uploaded on")