如何使用 BeautifulSoup 抓取 href 之外的文本
How to scrape text outside of href with BeautifulSoup
我正在尝试从以下内容中抓取文本“Woodford Reserve Master Collection Five Malt Stouted Mash”:
<a class="catalog_item_name" aria-hidden="true" tabindex="-1" id="WC_CatalogEntryDBThumbnailDisplayJSPF_3074457345616901168_link_9b" href="/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10051&storeId=10051&productId=3074457345616901168&langId=-1&partNumber=000086630prod&errorViewName=ProductDisplayErrorView&categoryId=1334014&top_category=25208&parent_category_rn=1334013&urlLangId=&variety=American+Whiskey&categoryType=Spirits&fromURL=%2fwebapp%2fwcs%2fstores%2fservlet%2fCatalogSearchResultView%3fstoreId%3d10051%26catalogId%3d10051%26langId%3d-1%26categoryId%3d1334014%26variety%3dAmerican%2bWhiskey%26categoryType%3dSpirits%26top_category%3d%26parent_category_rn%3d%26sortBy%3d5%26searchSource%3dE%26pageView%3d%26beginIndex%3d">Woodford Reserve Master Collection Five Malt Stouted Mash</a>
我可以使用以下代码抓取 href,但似乎无法单独抓取标题文本:
for product in soup.select('a.catalog_item_name'):
link.append(product['href'])
print(link)
我也试过了
for product in soup.select('a.catalog_item_name'):
link.append(product.a['href'])
print(link)
但是我似乎无法完全单独捕获标题信息。在此先感谢您的帮助!
尝试:
data=[]
for product in soup.select('a.catalog_item_name'):
link=product['href']
title=product.get_text()
data.append([link,title])
print(data)
我正在尝试从以下内容中抓取文本“Woodford Reserve Master Collection Five Malt Stouted Mash”:
<a class="catalog_item_name" aria-hidden="true" tabindex="-1" id="WC_CatalogEntryDBThumbnailDisplayJSPF_3074457345616901168_link_9b" href="/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10051&storeId=10051&productId=3074457345616901168&langId=-1&partNumber=000086630prod&errorViewName=ProductDisplayErrorView&categoryId=1334014&top_category=25208&parent_category_rn=1334013&urlLangId=&variety=American+Whiskey&categoryType=Spirits&fromURL=%2fwebapp%2fwcs%2fstores%2fservlet%2fCatalogSearchResultView%3fstoreId%3d10051%26catalogId%3d10051%26langId%3d-1%26categoryId%3d1334014%26variety%3dAmerican%2bWhiskey%26categoryType%3dSpirits%26top_category%3d%26parent_category_rn%3d%26sortBy%3d5%26searchSource%3dE%26pageView%3d%26beginIndex%3d">Woodford Reserve Master Collection Five Malt Stouted Mash</a>
我可以使用以下代码抓取 href,但似乎无法单独抓取标题文本:
for product in soup.select('a.catalog_item_name'):
link.append(product['href'])
print(link)
我也试过了
for product in soup.select('a.catalog_item_name'):
link.append(product.a['href'])
print(link)
但是我似乎无法完全单独捕获标题信息。在此先感谢您的帮助!
尝试:
data=[]
for product in soup.select('a.catalog_item_name'):
link=product['href']
title=product.get_text()
data.append([link,title])
print(data)