为什么 Beautiful Soup 只给我一个网站的第一个条目?
Why Is Beautiful Soup Only Giving Me the First Entry in a Website?
我正在尝试从以下网站上列出的文章中获取标题、日期和作者:https://coreyms.com/
这是我的代码运行
from bs4 import BeautifulSoup
import requests
import lxml
import csv
source = requests.get('http://coreyms.com').text
soup=BeautifulSoup(source,'lxml')
for match in soup.find_all('div',class_='site-
container'):
headline=match.main.header.h2.a.text
print(headline)
date=match.main.header.p.time.text
print(date)
author=match.main.header.p.span.a.span.text
print(author)
print()
然而,当我 运行 这段代码时,我只能从第一项中获得信息。任何帮助将非常感激。谢谢!
试试这个方法:
match = soup.find_all('h2')
for i in match:
print(i.text)
print(i.nextSibling.nextSibling.find('time').text)
print(i.nextSibling.nextSibling.find('span').text)
print('====')
输出:
Python Threading Tutorial: Run Code Concurrently Using the Threading Module
September 12, 2019
Corey Schafer
====
Update (2019-09-03)
September 3, 2019
Corey Schafer
====
等等
我会采用一种更快的方法,即 select 所有文章标签,然后循环使用 class select 或获取所需信息
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://coreyms.com/')
soup = bs(r.content, 'lxml')
for article in soup.select('article'):
title = article.select_one('.entry-title-link').text
date = article.select_one('.entry-time').text
author = article.select_one('.entry-author-name').text
print(title, date, author)
我正在尝试从以下网站上列出的文章中获取标题、日期和作者:https://coreyms.com/
这是我的代码运行
from bs4 import BeautifulSoup
import requests
import lxml
import csv
source = requests.get('http://coreyms.com').text
soup=BeautifulSoup(source,'lxml')
for match in soup.find_all('div',class_='site-
container'):
headline=match.main.header.h2.a.text
print(headline)
date=match.main.header.p.time.text
print(date)
author=match.main.header.p.span.a.span.text
print(author)
print()
然而,当我 运行 这段代码时,我只能从第一项中获得信息。任何帮助将非常感激。谢谢!
试试这个方法:
match = soup.find_all('h2')
for i in match:
print(i.text)
print(i.nextSibling.nextSibling.find('time').text)
print(i.nextSibling.nextSibling.find('span').text)
print('====')
输出:
Python Threading Tutorial: Run Code Concurrently Using the Threading Module
September 12, 2019
Corey Schafer
====
Update (2019-09-03)
September 3, 2019
Corey Schafer
====
等等
我会采用一种更快的方法,即 select 所有文章标签,然后循环使用 class select 或获取所需信息
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://coreyms.com/')
soup = bs(r.content, 'lxml')
for article in soup.select('article'):
title = article.select_one('.entry-title-link').text
date = article.select_one('.entry-time').text
author = article.select_one('.entry-author-name').text
print(title, date, author)