在“?”之后提取文本
Extracting text after "?"
我有一个字符串
x <- "Name of the Student? Michael Sneider"
我想从中提取 "Michael Sneider"。
我用过:
str_extract_all(x,"[a-z]+")
str_extract_all(data,"\?[a-z]+")
但是无法提取名称。
我认为这应该有所帮助
substr(x, str_locate(x, "?")+1, nchar(x))
试试这个:
sub('.*\?(.*)','\1',x)
str_match
在这种情况下更有帮助
str_match(x, ".*\?\s(.*)")[, 2]
#[1] "Michael Sneider"
x <- "Name of the Student? Michael Sneider"
sub(pattern = ".+?\?" , x , replacement = '' )
为了利用问题措辞松散的优势,我们可以走极端并使用自然语言处理从字符串中提取所有名称:
library(openNLP)
library(NLP)
# you'll also have to install the models with the next line, if you haven't already
# install.packages('openNLPmodels.en', repos = 'http://datacube.wu.ac.at/', type = 'source')
s <- as.String(x) # convert x to NLP package's String object
# make annotators
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
entity_annotator <- Maxent_Entity_Annotator()
# call sentence and word annotators
s_annotated <- annotate(s, list(sent_token_annotator, word_token_annotator))
# call entity annotator (which defaults to "person") and subset the string
s[entity_annotator(s, s_annotated)]
## Michael Sneider
矫枉过正?大概。但是很有趣,而且实际上并不难实现,真的。
我有一个字符串
x <- "Name of the Student? Michael Sneider"
我想从中提取 "Michael Sneider"。
我用过:
str_extract_all(x,"[a-z]+")
str_extract_all(data,"\?[a-z]+")
但是无法提取名称。
我认为这应该有所帮助
substr(x, str_locate(x, "?")+1, nchar(x))
试试这个:
sub('.*\?(.*)','\1',x)
str_match
在这种情况下更有帮助
str_match(x, ".*\?\s(.*)")[, 2]
#[1] "Michael Sneider"
x <- "Name of the Student? Michael Sneider"
sub(pattern = ".+?\?" , x , replacement = '' )
为了利用问题措辞松散的优势,我们可以走极端并使用自然语言处理从字符串中提取所有名称:
library(openNLP)
library(NLP)
# you'll also have to install the models with the next line, if you haven't already
# install.packages('openNLPmodels.en', repos = 'http://datacube.wu.ac.at/', type = 'source')
s <- as.String(x) # convert x to NLP package's String object
# make annotators
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
entity_annotator <- Maxent_Entity_Annotator()
# call sentence and word annotators
s_annotated <- annotate(s, list(sent_token_annotator, word_token_annotator))
# call entity annotator (which defaults to "person") and subset the string
s[entity_annotator(s, s_annotated)]
## Michael Sneider
矫枉过正?大概。但是很有趣,而且实际上并不难实现,真的。