用 BeautifulSoup 抓取 table

Webscrape a table with BeautifulSoup

我试图从这个 link: https://www.basketball-reference.com/teams/PHI/2022/lineups/ 中获取带有请求和 BeautifulSoup 的表格(然后是 tr 和 td 内容),但我没有得到任何结果。

我试过:

import requests
from bs4 import BeautifulSoup

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser') 

tables = soup.find_all('table') 

但是表的结果是 [].

看来表格放在评论里了,所以你得调整一下回复文字:

page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser') 

例子

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser') 

tables = soup.find_all('table') 

此外,@chitown88 也提到了 Commentbeautifulsoup 方法,可以找到 HTML 中的所有评论。请注意,您必须再次将字符串转换为 bs4

soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text))
例子
import requests
from bs4 import BeautifulSoup
from bs4 import Comment

url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser') 

soupTables = BeautifulSoup(''.join(soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text)))
soupTables.find_all('table')