使用 BeautifulSoup 网页抓取 ID 为 CSS 的标签

Question

我正在尝试通过网络抓取该网站以查找 ID =“2004.advanced”（存在）的标签。这是我试过的三行代码。

webpage = requests.get('https://www.basketball-reference.com/players/j/jamesle01.html')

soup = BeautifulSoup(webpage.content, 'html.parser')

print(soup.find_all( attrs = {'id': 'advanced.2004'}))

在此先感谢您的帮助！

Answer 1

问题是您要查找的元素在评论中。要解决此问题，请尝试遍历页面上的每个评论，使用 BeautifulSoup 解析其内容并搜索您想要的元素：

import requests
from bs4 import BeautifulSoup, Comment

url = 'https://www.basketball-reference.com/players/j/jamesle01.html'
webpage = requests.get(url)

soup = BeautifulSoup(webpage.content, 'html.parser')

for comment in soup.find_all(text=lambda el:isinstance(el, Comment)):
    comment_html = BeautifulSoup(comment, 'html.parser')
    el = comment_html.find(id='advanced.2004')

    if el != None: break

print(el)

使用 BeautifulSoup 网页抓取 ID 为 CSS 的标签

Using BeautifulSoup to Web Scrape Tags with CSS IDs

python

beautifulsoup

data-analysis