用 BeautifulSoup 抓取 table

Question

我试图从这个 link: https://www.basketball-reference.com/teams/PHI/2022/lineups/ 中获取带有请求和 BeautifulSoup 的表格（然后是 tr 和 td 内容），但我没有得到任何结果。

我试过：

import requests
from bs4 import BeautifulSoup

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser') 

tables = soup.find_all('table')

但是表的结果是 [].

Answer 1

看来表格放在评论里了，所以你得调整一下回复文字：

page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser')

例子

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser') 

tables = soup.find_all('table')

此外，@chitown88 也提到了 Comment 的 beautifulsoup 方法，可以找到 HTML 中的所有评论。请注意，您必须再次将字符串转换为 bs4：

soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text))

例子

import requests
from bs4 import BeautifulSoup
from bs4 import Comment

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser') 

soupTables = BeautifulSoup(''.join(soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text)))
soupTables.find_all('table')

用 BeautifulSoup 抓取 table

Webscrape a table with BeautifulSoup

python

beautifulsoup

web-scraping

python-requests

例子

例子