Python 网页抓取 |如何使用 try 和 except 处理丢失的元素,以便在找不到元素时打印为不可用?
Python Web scraping | How to handle missing elements using try and except so that it prints as Not available if element not is found?
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import bs4
headers = {'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.116 Safari/537.36'}
my_url = 'https://www.jiomart.com/c/groceries/dairy-bakery/dairy/62'
uclient = uReq(my_url)
page_html = uclient.read()
uclient.close()
bs41 = soup(page_html, 'html.parser')
containers = bs41.find_all('div', {'col-md-3 p-0'})
#print(len(containers))
#print(soup.prettify(containers[0]))
for container in containers:
p_name = container.find_all('span', {'class' : 'clsgetname'})
productname = p_name[0].text
o_p = container.find_all('span' , id = 'final_price' )
offer_price = o_p[0].text
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
except:
print('not available')
print('Product name is', productname)
print('Product Mrp is', offer_price)
print('Product actual price', actual_price)
print()
While performing the above code, There is a product which doesn't have
a actual price and has offer price only. But other products are
having both the values. When I'm trying to handle the exception via
try and except by printing 'Not Available' it's not Working.
Rather It's printing it on the first-line as Not Available and it's
also showing a actual price of rs 35 whereas actual price is null.
How should i deal with these things, so it may help me.
问题是,即使它没有找到元素,它仍然会打印 actual_price
,这可能在外部范围内。
你有两种方法来解决这个问题。
- 第一种是仅在找到元素时才打印,为此您可以这样做:
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
print('Product name is', productname)
print('Product Mrp is', offer_price)
print('Product actual price', actual_price)
except:
print('not available')
- 第二个是将
actual_price
设置为“不可用”,所以它在'Product actual price'旁边打印不可用。要完成这项工作,您只需要在 except 块中添加 actual_price = 'not found'
,这样您的代码就会变成:
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
except:
print('not available')
actual_price = 'not found'
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import bs4
headers = {'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/83.0.4103.116 Safari/537.36'}
my_url = 'https://www.jiomart.com/c/groceries/dairy-bakery/dairy/62'
uclient = uReq(my_url)
page_html = uclient.read()
uclient.close()
bs41 = soup(page_html, 'html.parser')
containers = bs41.find_all('div', {'col-md-3 p-0'})
#print(len(containers))
#print(soup.prettify(containers[0]))
for container in containers:
p_name = container.find_all('span', {'class' : 'clsgetname'})
productname = p_name[0].text
o_p = container.find_all('span' , id = 'final_price' )
offer_price = o_p[0].text
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
except:
print('not available')
print('Product name is', productname)
print('Product Mrp is', offer_price)
print('Product actual price', actual_price)
print()
While performing the above code, There is a product which doesn't have a actual price and has offer price only. But other products are having both the values. When I'm trying to handle the exception via try and except by printing 'Not Available' it's not Working.
Rather It's printing it on the first-line as Not Available and it's also showing a actual price of rs 35 whereas actual price is null.
How should i deal with these things, so it may help me.
问题是,即使它没有找到元素,它仍然会打印 actual_price
,这可能在外部范围内。
你有两种方法来解决这个问题。
- 第一种是仅在找到元素时才打印,为此您可以这样做:
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
print('Product name is', productname)
print('Product Mrp is', offer_price)
print('Product actual price', actual_price)
except:
print('not available')
- 第二个是将
actual_price
设置为“不可用”,所以它在'Product actual price'旁边打印不可用。要完成这项工作,您只需要在 except 块中添加actual_price = 'not found'
,这样您的代码就会变成:
try:
ap = container.find_all('strike', id = 'price')
actual_price = ap[0].text
except:
print('not available')
actual_price = 'not found'