BeautifulSoup |无法遍历 div [js-content-images] class 标签
BeautifulSoup | not able to iterate through div [ js-content-images ] class tag
参考下图我想删除疾病的名称,与疾病相关的网址和疾病的图标图像。无法遍历 div [js-content-images] 标签 !
import requests
from bs4 import BeautifulSoup
URL = "https://dermnetnz.org/image-library"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
job_elements = soup.find("div", class_="flex [ js-sticky-container ]")
job2 = job_elements.find_all("div", class_="imageList__group")
for job_element in job2:
print(job_element)
您找不到它的原因与通过 javascript 加载的那些元素有关。这是一个动态的网站。您可以通过阻止 javascript 执行来看到这一点,结果将缺少图像。
您有两个选择:您可以尝试通过 javascript 对它进行逆向工程,或者您可以使用浏览器渲染引擎渲染 javascript。
有 Selenium,通过 pip install selenium
可以使用 Python 绑定。 Click this link for installation instructions for your system,因为您还需要安装驱动程序,例如 Geckodriver 或 ChromeDriver。
然后,您可能需要稍微更改以下代码才能使它适合您...但是以下代码会找到您想要的第一个元素,而且非常简单:
# setting up
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
# your own application
driver.get('https://dermnetnz.org/image-library')
element = driver.find_element_by_class_name('imageList__group__item')
img_element = element.find_element_by_tag_name('img')
# here is the link:
print(element.get_attribute('href'))
# here is the text:
print(element.text)
# here is the img source:
print(img_element.get_attribute('src'))
想要找到其中的多个?然后它就像使用 elements = driver.find_elements_by_class_name('imageList__group__item')
而不是 element = driver.find_element_by_class_name('imageList__group__item')
并循环遍历它们一样简单,为每个找到 img_element。
您不需要 bs4
或 selenium
来抓取此页面。如果你去 network tab
你会得到 json url
你需要发送请求并捕获 json 响应。
代码:
import requests
res=requests.get("https://dermnetnz.org/image-library/imagesJson")
result=res.json()
for r in result:
print("Diseases Name : " + r['name'])
print("Image : " + r['thumbnail'])
print("Url : " + "https://dermnetnz.org" + r['url'])
输出:
Diseases Name : Roseola images
Image : https://dermnetnz.org/assets/Uploads/roseola-001__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/roseola-images/?stage=Live
Diseases Name : Dermatomyositis images
Image : https://dermnetnz.org/assets/Uploads/dermatomyositis-eyelids-4__FocusFillWzE1MCwxMTAsIngiLDhd.jpg
Url : https://dermnetnz.org/topics/dermatomyositis-images/?stage=Live
Diseases Name : Solar keratosis affecting the face images
Image : https://dermnetnz.org/assets/Uploads/248__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-face-images/?stage=Live
Diseases Name : Actinic keratosis affecting the face images
Image : https://dermnetnz.org/assets/Uploads/248__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-face-images/?stage=Live
Diseases Name : Solar keratosis affecting the hand images
Image : https://dermnetnz.org/assets/Uploads/393__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-affecting-the-hand-images/?stage=Live
Diseases Name : Solar keratosis affecting the legs and feet images
Image : https://dermnetnz.org/assets/Uploads/478__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-leg-and-foot-images/?stage=Live
Diseases Name : Solar keratosis affecting the scalp images
Image : https://dermnetnz.org/assets/Uploads/418__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-scalp-images/?stage=Live
Diseases Name : Solar keratosis on the nose images
Image : https://dermnetnz.org/assets/Uploads/sks-nose3-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-on-the-nose-images/?stage=Live
Diseases Name : Solar keratosis treated with imiquimod images
Image : https://dermnetnz.org/assets/Uploads/3723__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-imiquimod-images/?stage=Live
Diseases Name : Autoimmune alopecia images
Image : https://dermnetnz.org/assets/Uploads/1323__FocusFillWzE1MCwxMTAsInkiLDIzXQ.jpg
Url : https://dermnetnz.org/topics/alopecia-areata-images/?stage=Live
Diseases Name : Hypomelanotic malignant melanoma images
Image : https://dermnetnz.org/assets/Uploads/12a-amelanotic-melanoma__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/amelanotic-melanoma-images/?stage=Live
Diseases Name : Epiloia images
Image : https://dermnetnz.org/assets/Uploads/angiofibromas-19-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/tuberous-sclerosis-images/?stage=Live
Diseases Name : Perleche images
Image : https://dermnetnz.org/assets/Uploads/perleche13-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/angular-cheilitis-images/?stage=Live
Diseases Name : Besnier prurigo images
Image : https://dermnetnz.org/assets/Uploads/atopic26-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atopic-dermatitis-images/?stage=Live
Diseases Name : Atopic eczema images
Image : https://dermnetnz.org/assets/Uploads/atopic26-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atopic-dermatitis-images/?stage=Live
Diseases Name : Atypical melanocytic naevus
Image : https://dermnetnz.org/assets/Uploads/604__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atypical-naevus-images/?stage=Live
Diseases Name : Bacteria images
Image : https://dermnetnz.org/assets/Uploads/syph6-s-2__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/image-catalogue/bacterial-skin-infection-images/?stage=Live
...等等
参考下图我想删除疾病的名称,与疾病相关的网址和疾病的图标图像。无法遍历 div [js-content-images] 标签 !
import requests
from bs4 import BeautifulSoup
URL = "https://dermnetnz.org/image-library"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
job_elements = soup.find("div", class_="flex [ js-sticky-container ]")
job2 = job_elements.find_all("div", class_="imageList__group")
for job_element in job2:
print(job_element)
您找不到它的原因与通过 javascript 加载的那些元素有关。这是一个动态的网站。您可以通过阻止 javascript 执行来看到这一点,结果将缺少图像。
您有两个选择:您可以尝试通过 javascript 对它进行逆向工程,或者您可以使用浏览器渲染引擎渲染 javascript。
有 Selenium,通过 pip install selenium
可以使用 Python 绑定。 Click this link for installation instructions for your system,因为您还需要安装驱动程序,例如 Geckodriver 或 ChromeDriver。
然后,您可能需要稍微更改以下代码才能使它适合您...但是以下代码会找到您想要的第一个元素,而且非常简单:
# setting up
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
# your own application
driver.get('https://dermnetnz.org/image-library')
element = driver.find_element_by_class_name('imageList__group__item')
img_element = element.find_element_by_tag_name('img')
# here is the link:
print(element.get_attribute('href'))
# here is the text:
print(element.text)
# here is the img source:
print(img_element.get_attribute('src'))
想要找到其中的多个?然后它就像使用 elements = driver.find_elements_by_class_name('imageList__group__item')
而不是 element = driver.find_element_by_class_name('imageList__group__item')
并循环遍历它们一样简单,为每个找到 img_element。
您不需要 bs4
或 selenium
来抓取此页面。如果你去 network tab
你会得到 json url
你需要发送请求并捕获 json 响应。
代码:
import requests
res=requests.get("https://dermnetnz.org/image-library/imagesJson")
result=res.json()
for r in result:
print("Diseases Name : " + r['name'])
print("Image : " + r['thumbnail'])
print("Url : " + "https://dermnetnz.org" + r['url'])
输出:
Diseases Name : Roseola images
Image : https://dermnetnz.org/assets/Uploads/roseola-001__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/roseola-images/?stage=Live
Diseases Name : Dermatomyositis images
Image : https://dermnetnz.org/assets/Uploads/dermatomyositis-eyelids-4__FocusFillWzE1MCwxMTAsIngiLDhd.jpg
Url : https://dermnetnz.org/topics/dermatomyositis-images/?stage=Live
Diseases Name : Solar keratosis affecting the face images
Image : https://dermnetnz.org/assets/Uploads/248__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-face-images/?stage=Live
Diseases Name : Actinic keratosis affecting the face images
Image : https://dermnetnz.org/assets/Uploads/248__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-face-images/?stage=Live
Diseases Name : Solar keratosis affecting the hand images
Image : https://dermnetnz.org/assets/Uploads/393__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-affecting-the-hand-images/?stage=Live
Diseases Name : Solar keratosis affecting the legs and feet images
Image : https://dermnetnz.org/assets/Uploads/478__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-leg-and-foot-images/?stage=Live
Diseases Name : Solar keratosis affecting the scalp images
Image : https://dermnetnz.org/assets/Uploads/418__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-scalp-images/?stage=Live
Diseases Name : Solar keratosis on the nose images
Image : https://dermnetnz.org/assets/Uploads/sks-nose3-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-on-the-nose-images/?stage=Live
Diseases Name : Solar keratosis treated with imiquimod images
Image : https://dermnetnz.org/assets/Uploads/3723__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-imiquimod-images/?stage=Live
Diseases Name : Autoimmune alopecia images
Image : https://dermnetnz.org/assets/Uploads/1323__FocusFillWzE1MCwxMTAsInkiLDIzXQ.jpg
Url : https://dermnetnz.org/topics/alopecia-areata-images/?stage=Live
Diseases Name : Hypomelanotic malignant melanoma images
Image : https://dermnetnz.org/assets/Uploads/12a-amelanotic-melanoma__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/amelanotic-melanoma-images/?stage=Live
Diseases Name : Epiloia images
Image : https://dermnetnz.org/assets/Uploads/angiofibromas-19-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/tuberous-sclerosis-images/?stage=Live
Diseases Name : Perleche images
Image : https://dermnetnz.org/assets/Uploads/perleche13-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/angular-cheilitis-images/?stage=Live
Diseases Name : Besnier prurigo images
Image : https://dermnetnz.org/assets/Uploads/atopic26-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atopic-dermatitis-images/?stage=Live
Diseases Name : Atopic eczema images
Image : https://dermnetnz.org/assets/Uploads/atopic26-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atopic-dermatitis-images/?stage=Live
Diseases Name : Atypical melanocytic naevus
Image : https://dermnetnz.org/assets/Uploads/604__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atypical-naevus-images/?stage=Live
Diseases Name : Bacteria images
Image : https://dermnetnz.org/assets/Uploads/syph6-s-2__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/image-catalogue/bacterial-skin-infection-images/?stage=Live
...等等