用 BeautifulSoup 抓取 table
Webscrape a table with BeautifulSoup
我试图从这个 link: https://www.basketball-reference.com/teams/PHI/2022/lineups/ 中获取带有请求和 BeautifulSoup 的表格(然后是 tr 和 td 内容),但我没有得到任何结果。
我试过:
import requests
from bs4 import BeautifulSoup
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
tables = soup.find_all('table')
但是表的结果是 [].
看来表格放在评论里了,所以你得调整一下回复文字:
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser')
例子
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser')
tables = soup.find_all('table')
此外,@chitown88 也提到了 Comment
的 beautifulsoup
方法,可以找到 HTML 中的所有评论。请注意,您必须再次将字符串转换为 bs4
:
soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text))
例子
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
soupTables = BeautifulSoup(''.join(soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text)))
soupTables.find_all('table')
我试图从这个 link: https://www.basketball-reference.com/teams/PHI/2022/lineups/ 中获取带有请求和 BeautifulSoup 的表格(然后是 tr 和 td 内容),但我没有得到任何结果。
我试过:
import requests
from bs4 import BeautifulSoup
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
tables = soup.find_all('table')
但是表的结果是 [].
看来表格放在评论里了,所以你得调整一下回复文字:
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser')
例子
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
page = page.text.replace("<!--","").replace("-->","")
soup = BeautifulSoup(page, 'html.parser')
tables = soup.find_all('table')
此外,@chitown88 也提到了 Comment
的 beautifulsoup
方法,可以找到 HTML 中的所有评论。请注意,您必须再次将字符串转换为 bs4
:
soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text))
例子
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
url = "https://www.basketball-reference.com/teams/PHI/2022/lineups/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
soupTables = BeautifulSoup(''.join(soup.find_all(string=lambda text: isinstance(text, Comment) and '<table' in text)))
soupTables.find_all('table')