如何使用 Beautiful soup 从维基百科中提取 table

Question

我正在尝试编写一个从 this 维基百科中提取 table 的抓取工具 page.The 问题是，我可以提取页面上的所有 table，除了一个我真正需要的（table，其中包含美国曾经进行过的所有选举的统计数据）。我不认为问题出在我的标签上。
这是我的代码

from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup
from urllib.request import urlopen

#getting the wiki page
page_info=urlopen('https://en.wikipedia.org/wiki/United_States_presidential_election')

soup=BeautifulSoup(page_info, 'html.parser')

headline=soup.find('table', "wikitable sortable jquery-tablesorter")
print(headline)

我认为我缺少一些重要的东西，但我无法全神贯注。有人可以帮帮我吗

Answer 1

这样做的一种方法是：

import pandas as pd
import requests
from bs4 import BeautifulSoup


page = requests.get('https://en.wikipedia.org/wiki/United_States_presidential_election').text
soup = BeautifulSoup(page, 'html.parser')
table = soup.find('table', class_="wikitable sortable")

df = pd.read_html(str(table))
df = pd.concat(df)
print(df)
df.to_csv("elections.csv", index=False)

输出：

     Year                                    Party  ... Electoral votes      Notes
0    1788                              Independent  ...        69 / 138        NaN
1    1788                               Federalist  ...        34 / 138        NaN
2    1788                               Federalist  ...         9 / 138        NaN
3    1788                               Federalist  ...         6 / 138        NaN
4    1788                               Federalist  ...         6 / 138        NaN
..    ...                                      ...  ...             ...        ...
[219 rows x 8 columns]

或者 .csv 文件如下所示：

注意：无论何时抓取，请务必关闭 JS (JavaScript)。 BeautifulSoup 看不到动态呈现的内容。这样你就得不到任何回报，因为如果没有 JS，你所追求的标签的 class 是不同的。

如何使用 Beautiful soup 从维基百科中提取 table

How can I extract a table from wikipedia using Beautiful soup

urllib

beautifulsoup

web-scraping

python-3.x