如何处理延迟加载?
How to handle lazy loading?
在我的程序的这一部分,我试图获取网页上图像的所有链接。但是,这些图片是延迟加载的。尽管它不像:向下滚动时出现图片,但图片已经在这里了。每个页面上有 30 个产品。向下滚动页面也无济于事。我该如何处理?
from selenium import webdriver
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.common.action_chains import ActionChains
url= 'https://www.bookdepository.com/category/2/Art-Photography/browse/viewmode/all'
driver = webdriver.Chrome('blabla')
driver.get(url)
a = 1
while a != 100:
try:
link_picture = WebDriverWait(driver, 2).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[4]/div[5]/div[2]/div[3]/div/div/div/div/div["+ str(a) +"]/div[1]/a/img"))).get_attribute('src')
print(link_picture)
except:
print("\nno products left")
#e = a - 1
#print(a)
#print(e)
break
a = a + 1
实际上页面是通过 JavaScript
加载的,一旦页面加载就在 host
本身上呈现 internally
,因此 Lazy-Load 将是一个非常糟糕的方法通过 Implicit, Explicit, & Fluent
处理 selenium
等待 Selenium WebDriver
,这将花费大量时间。
我们可以巧妙地使用 requests
和 bs4
。我们将收集 img
IDs#
,然后将它们与网站上的典型匹配。
Note: I've checked manually and i can confirm for you that the token
id of CloudFront
for that site is static
which is d1w7fb2mkkr3kw
.
from bs4 import BeautifulSoup
import requests
r = requests.get(
"https://www.bookdepository.com/category/2/Art-Photography/browse/viewmode/all")
soup = BeautifulSoup(r.text, 'html.parser')
url = "https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/"
for item in soup.findAll("meta", itemprop="isbn"):
item = item.get("content")
print(
f"{url}{item[:4]}/{item[4:8]}/{item}.jpg")
输出:
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4087/9781408708989.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/5098/9781509853311.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/7522/9780752265629.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/1410/9780141014081.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/2500/9781250038821.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/1410/9780141035796.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8499/9781849941679.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/0995/9780099539551.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4087/9781408711705.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9788/8837/9788883701153.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/7475/9780747568766.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8609/9781860969423.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4711/9781471157790.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/5098/9781509829477.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/7515/9780751535662.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/5005/9780500513606.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4088/9781408890769.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/8771/9780877180128.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9788/8837/9788883705601.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4722/9781472200341.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8478/9781847807717.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8609/9781860969430.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/3409/9780340936177.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/4862/9780486254500.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/7160/9780716022237.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/0994/9780099457046.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4521/9781452106557.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/2411/9780241184837.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/7423/9781742372389.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8499/9781849942850.jpg
Update:
import requests
from bs4 import BeautifulSoup
r = requests.get(
"https://www.bookdepository.com/category/2/Art-Photography/browse/viewmode/all")
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll("img", class_="lazy"):
print(item.get("data-lazy"))
在我的程序的这一部分,我试图获取网页上图像的所有链接。但是,这些图片是延迟加载的。尽管它不像:向下滚动时出现图片,但图片已经在这里了。每个页面上有 30 个产品。向下滚动页面也无济于事。我该如何处理?
from selenium import webdriver
import os
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.common.action_chains import ActionChains
url= 'https://www.bookdepository.com/category/2/Art-Photography/browse/viewmode/all'
driver = webdriver.Chrome('blabla')
driver.get(url)
a = 1
while a != 100:
try:
link_picture = WebDriverWait(driver, 2).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[4]/div[5]/div[2]/div[3]/div/div/div/div/div["+ str(a) +"]/div[1]/a/img"))).get_attribute('src')
print(link_picture)
except:
print("\nno products left")
#e = a - 1
#print(a)
#print(e)
break
a = a + 1
实际上页面是通过 JavaScript
加载的,一旦页面加载就在 host
本身上呈现 internally
,因此 Lazy-Load 将是一个非常糟糕的方法通过 Implicit, Explicit, & Fluent
处理 selenium
等待 Selenium WebDriver
,这将花费大量时间。
我们可以巧妙地使用 requests
和 bs4
。我们将收集 img
IDs#
,然后将它们与网站上的典型匹配。
Note: I've checked manually and i can confirm for you that the
token
id ofCloudFront
for that site isstatic
which isd1w7fb2mkkr3kw
.
from bs4 import BeautifulSoup
import requests
r = requests.get(
"https://www.bookdepository.com/category/2/Art-Photography/browse/viewmode/all")
soup = BeautifulSoup(r.text, 'html.parser')
url = "https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/"
for item in soup.findAll("meta", itemprop="isbn"):
item = item.get("content")
print(
f"{url}{item[:4]}/{item[4:8]}/{item}.jpg")
输出:
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4087/9781408708989.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/5098/9781509853311.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/7522/9780752265629.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/1410/9780141014081.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/2500/9781250038821.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/1410/9780141035796.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8499/9781849941679.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/0995/9780099539551.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4087/9781408711705.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9788/8837/9788883701153.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/7475/9780747568766.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8609/9781860969423.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4711/9781471157790.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/5098/9781509829477.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/7515/9780751535662.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/5005/9780500513606.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4088/9781408890769.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/8771/9780877180128.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9788/8837/9788883705601.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4722/9781472200341.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8478/9781847807717.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8609/9781860969430.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/3409/9780340936177.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/4862/9780486254500.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/7160/9780716022237.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/0994/9780099457046.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/4521/9781452106557.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9780/2411/9780241184837.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/7423/9781742372389.jpg
https://d1w7fb2mkkr3kw.cloudfront.net/assets/images/book/mid/9781/8499/9781849942850.jpg
Update:
import requests
from bs4 import BeautifulSoup
r = requests.get(
"https://www.bookdepository.com/category/2/Art-Photography/browse/viewmode/all")
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll("img", class_="lazy"):
print(item.get("data-lazy"))