Beautiful Soup 和 requests 问题它不显示任何文本输出
Beautiful Soup and requests problem it doesn't show any text output
我正在使用美丽的汤并请求打印本网站文章的全文
https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture
这是我的代码:
import requests
from bs4 import BeautifulSoup
url = requests.get("https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture")
html = url.text
page = BeautifulSoup(html, 'html.parser')
match = page.find_all('div', 'parbase cn_text')
page_list = [[k.get_text() for k in i.find_all('p')] for i in match]
for i in page_list[:-2]:
for k in i:
print(k + '\n')
我的代码 运行 没有任何错误,但它没有在输出中显示任何文本请帮助我找到我的错误
获取文章数据 select div
有 class article__chunks
。属于它的文章数据。
import requests
from bs4 import BeautifulSoup
url = requests.get("https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture")
html = url.text
page = BeautifulSoup(html, 'html.parser')
match = page.find('div', {'class': 'article__chunks'})
page_list = [[k.get_text() for k in i.find_all('p')] for i in match]
for i in page_list[:-2]:
for k in i:
print(k + '\n')
会发生什么?
您尝试 find_all()
div 带有两个不存在的 类 标签,因此 match
为空。
如何解决?
使用正确的模式,我使用 css 选择器来避免额外的循环:
select('article.article.main-content p')
列表理解看起来像:
[p.get_text() for p in page.select('article.article.main-content p')]
工作示例
import requests
from bs4 import BeautifulSoup
url = requests.get("https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture")
html = url.text
page = BeautifulSoup(html, 'html.parser')
print(*[p.get_text() for p in page.select('article.article.main-content p')], sep='\n')
我正在使用美丽的汤并请求打印本网站文章的全文
https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture
这是我的代码:
import requests
from bs4 import BeautifulSoup
url = requests.get("https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture")
html = url.text
page = BeautifulSoup(html, 'html.parser')
match = page.find_all('div', 'parbase cn_text')
page_list = [[k.get_text() for k in i.find_all('p')] for i in match]
for i in page_list[:-2]:
for k in i:
print(k + '\n')
我的代码 运行 没有任何错误,但它没有在输出中显示任何文本请帮助我找到我的错误
获取文章数据 select div
有 class article__chunks
。属于它的文章数据。
import requests
from bs4 import BeautifulSoup
url = requests.get("https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture")
html = url.text
page = BeautifulSoup(html, 'html.parser')
match = page.find('div', {'class': 'article__chunks'})
page_list = [[k.get_text() for k in i.find_all('p')] for i in match]
for i in page_list[:-2]:
for k in i:
print(k + '\n')
会发生什么?
您尝试 find_all()
div 带有两个不存在的 类 标签,因此 match
为空。
如何解决?
使用正确的模式,我使用 css 选择器来避免额外的循环:
select('article.article.main-content p')
列表理解看起来像:
[p.get_text() for p in page.select('article.article.main-content p')]
工作示例
import requests
from bs4 import BeautifulSoup
url = requests.get("https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture")
html = url.text
page = BeautifulSoup(html, 'html.parser')
print(*[p.get_text() for p in page.select('article.article.main-content p')], sep='\n')