如何提取 beautifulsoup 的评论?
How to extract a comment with beautifulsoup?
我完全不熟悉 python 和数据挖掘,所以我有一个关于从输出中提取部分的问题。我在 3.6 中使用 Python 并且今天早上更新了所有内容。我已将输出匿名化并删除了所有包含密码、令牌等的行。
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("facebookoutput.html"), "html.parser")
comments = soup.findAll('div', class_="_2b06")
print(comments[0]) # show print of first entry:
<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&rc=p& amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like Whosebug. </div></div>
我坚持要得到`有很好的评论。我喜欢 Whosebug。' 出来了。
提前致谢。
试试这个:
from bs4 import BeautifulSoup
content="""
<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&rc=p& amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like Whosebug. </div></div>
"""
soup = BeautifulSoup(content, "html.parser")
comments = ' '.join([item.text for item in soup.select("[data-sigil='comment-body']")])
print(comments)
输出:
There is nice comment. I like Whosebug.
我完全不熟悉 python 和数据挖掘,所以我有一个关于从输出中提取部分的问题。我在 3.6 中使用 Python 并且今天早上更新了所有内容。我已将输出匿名化并删除了所有包含密码、令牌等的行。
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("facebookoutput.html"), "html.parser")
comments = soup.findAll('div', class_="_2b06")
print(comments[0]) # show print of first entry:
<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&rc=p& amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like Whosebug. </div></div>
我坚持要得到`有很好的评论。我喜欢 Whosebug。' 出来了。
提前致谢。
试试这个:
from bs4 import BeautifulSoup
content="""
<div class="_2b06"><div class="_2b05"><a href="/stuartd?fref=nf&rc=p& amp;__tn__=R-R">some Name </a></div><div data-commentid="100000000000000000222222000000000000000" data-sigil="comment-body">There is nice comment. I like Whosebug. </div></div>
"""
soup = BeautifulSoup(content, "html.parser")
comments = ' '.join([item.text for item in soup.select("[data-sigil='comment-body']")])
print(comments)
输出:
There is nice comment. I like Whosebug.