将一段分成 R 中的句子向量
Breaking a paragraph into a vector of sentences in R
我有以下段落:
Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)
为了应用 RSentiment
包中的 calculate_total_presence_sentiment
命令,我想将本段分成句子向量,如下所示:
[1] "Well, um...such a personal topic."
[2] "No wonder I am the first to write a review."
[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
[4] "And I had, well, major problems in this area and now I don't."
[5] "'Nuff said."
[6] ":-)"
感谢您对此的帮助。
qdap
有一个方便的功能:
sent_detect_nlp - Detect and split sentences on endmark boundaries
using openNLP & NLP utilities which matches the onld version of the
openNLP package's now removed sentDetect
function.
library(qdap)
txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
sent_detect_nlp(txt)
#[1] "Well, um...such a personal topic."
#[2] "No wonder I am the first to write a review."
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
#[4] "And I had, well, major problems in this area and now I don't."
#[5] "'Nuff said."
#[6] ":-)"
肮脏的解决方案
> data <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
> ?"regular expression"
> strsplit(data, "(?<=[^.][.][^.])", perl=TRUE)
[[1]]
[1] "Well, um...such a personal topic. "
[2] "No wonder I am the first to write a review. "
[3] "Suffice to say this stuff does just what they claim and tastes pleasant. "
[4] "And I had, well, major problems in this area and now I don't. "
[5] "'Nuff said. "
[6] ":-)"
使用来自https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
的工具
您可以将文本保存在 .txt 文件中。确保 .txt 文件中的每一行都包含一个要作为向量读取的语句。
使用基函数 readLines('filepath/filename.txt')
。
生成的数据框将读取原始文本文件中的每一行作为向量。
> mylines <- readLines('text.txt')
Warning message:
In readLines("text.txt") : incomplete final line found on 'text.txt'
> mylines
[1] "Well, um...such a personal topic."
[2] "No wonder I am the first to write a review."
[3] "Suffice to say this stuff does just what they claim and tastes
pleasant."
[4] "And I had, well, major problems in this area and now I don't."
[5] "'Nuff said'."
[6] ":-)"
> mylines[3]
[1] "Suffice to say this stuff does just what they claim and tastes
pleasant."
我有以下段落:
Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)
为了应用 RSentiment
包中的 calculate_total_presence_sentiment
命令,我想将本段分成句子向量,如下所示:
[1] "Well, um...such a personal topic."
[2] "No wonder I am the first to write a review."
[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
[4] "And I had, well, major problems in this area and now I don't."
[5] "'Nuff said."
[6] ":-)"
感谢您对此的帮助。
qdap
有一个方便的功能:
sent_detect_nlp - Detect and split sentences on endmark boundaries using openNLP & NLP utilities which matches the onld version of the openNLP package's now removed
sentDetect
function.
library(qdap)
txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
sent_detect_nlp(txt)
#[1] "Well, um...such a personal topic."
#[2] "No wonder I am the first to write a review."
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
#[4] "And I had, well, major problems in this area and now I don't."
#[5] "'Nuff said."
#[6] ":-)"
肮脏的解决方案
> data <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
> ?"regular expression"
> strsplit(data, "(?<=[^.][.][^.])", perl=TRUE)
[[1]]
[1] "Well, um...such a personal topic. "
[2] "No wonder I am the first to write a review. "
[3] "Suffice to say this stuff does just what they claim and tastes pleasant. "
[4] "And I had, well, major problems in this area and now I don't. "
[5] "'Nuff said. "
[6] ":-)"
使用来自https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
的工具您可以将文本保存在 .txt 文件中。确保 .txt 文件中的每一行都包含一个要作为向量读取的语句。
使用基函数 readLines('filepath/filename.txt')
。
生成的数据框将读取原始文本文件中的每一行作为向量。
> mylines <- readLines('text.txt')
Warning message:
In readLines("text.txt") : incomplete final line found on 'text.txt'
> mylines
[1] "Well, um...such a personal topic."
[2] "No wonder I am the first to write a review."
[3] "Suffice to say this stuff does just what they claim and tastes
pleasant."
[4] "And I had, well, major problems in this area and now I don't."
[5] "'Nuff said'."
[6] ":-)"
> mylines[3]
[1] "Suffice to say this stuff does just what they claim and tastes
pleasant."