谁有办法标记一个段落,将每个句子放入 pandas 数据框中,并对每个句子进行情感分析?

Anyone have a way to tokenize a paragraph, put each sentence into a pandas data frame, and perform sentiment analysis on each?

初学者 NLP/python 程序员。标题说明了一切。我基本上需要一个代码来标记一个段落,对每个句子进行情感分析,将每个句子连同它的评分放在 pandas 数据框上。我已经有了可以对段落进行标记甚至执行情感分析的代码,但我正在努力将两者放入数据框中。到目前为止,我有:

我使用 newspaper3k 提取了 url 和文本。

from newspaper import fulltext
import requests
url = "https://www.click2houston.com/news/local/2021/06/18/houston-water-wastewater-proposed-increase-this-is-what-mayor-sylvester-turner-wants-you-to-know/"
text = fulltext(requests.get(url).text)

然后我使用BERT extractive summarizer对文章文本进行了总结。

models = Summarizer()
result = models(text, min_length=30)
full = "".join(result)
type(full)

然后我使用 nltk 将摘要标记为句子。

tokens=sent_tokenize(full)
print(type(np.array(tokens)[0]))

最后,我把它放到了一个基本的dataframe中。

df = pd.DataFrame(np.array(tokens), columns=['sentences'])

我唯一缺少的是情绪分析。我只需要对实施到数据框架中的每个句子进行情绪分析(最好来自 BERT)评级。

使用变压器管道

情绪和总结管道都存在

https://huggingface.co/course/chapter1/3?fw=pt

Huggingface让你随心所欲

from transformers import pipeline
from newspaper import fulltext
import requests
import pandas as pd
import numpy as np
url = "https://www.click2houston.com/news/local/2021/06/18/houston-water-wastewater-proposed-increase-this-is-what-mayor-sylvester-turner-wants-you-to-know/"
text = fulltext(requests.get(url).text)
texts = [item.strip() for item in text.split('\n')[:10] if item.strip()]
summarizer = pipeline("summarization")
sentiment_analyser = pipeline('sentiment-analysis')
sumerize = lambda text:simmarizer(text, min_length=5, max_length=30)
sentiment_analyse = lambda sentiment_analyser:snt(text)
df = pd.DataFrame(np.array(texts), columns=['lines'])
df['Summarized'] = df.lines.apply(summarizer)
df['Sentiment'] = df.lines.apply(sentiment_analyser)
print(df.head())