Python:使用 BeautifulSoup 提取 <div> 标签的所有子项
Python: Extract all childs of a <div> tag using BeautifulSoup
标签是这样的:
<div class="zg_itemWrapper" style="height:315px">
<div class="zg_title"><a href="http://www.amazon.in/Taste-Suspense-Action-Thriller-Mystery-ebook/dp/B00JKN41ZS/ref=zg_bs_1637004031_f_2">The Taste of Fear (A Suspense Action...</a>
</div>
<div class="zg_byline">by Jeremy Bates</div>
<div class="zg_price">Free</div>
</div>
<div class="zg_itemWrapper" style="height:315px">
<div class="zg_title"><a href="http://www.amazon.in/Taste-Suspense-Action-Thriller-Mystery-ebook/dp/B00JKN41ZS/ref=zg_bs_1637004031_f_2">Another Book</a>
</div>
<div class="zg_byline">by Jeremy</div>
<div class="zg_price">Free</div>
</div>
我正在使用 BeautifulSoup 阅读网页并提取一些细节:
标题、作者、价格和Link
我试过的代码只能提取其中一个,但我希望每个标题都在 collection 中。
items = soup.find_all("div", {"class":"zg_itemWrapper"})
for item in items:
titles = item.find_all("div", {"class":"zg_title"})
for title in titles:
print title.text
你走在正确的轨道上。
使用 find
class 名称为每个 "itemWrapper" 找到:
items = soup.find_all("div", {"class":"zg_itemWrapper"})
for item in items:
title_elm = item.find("div", {"class":"zg_title"}).a
title = title_elm.get_text()
link = title_elm["href"]
author = item.find("div", {"class": "zg_byline"}).get_text()
price = item.find("div", {"class": "zg_price"}).get_text()
print title, link, author, price
标签是这样的:
<div class="zg_itemWrapper" style="height:315px">
<div class="zg_title"><a href="http://www.amazon.in/Taste-Suspense-Action-Thriller-Mystery-ebook/dp/B00JKN41ZS/ref=zg_bs_1637004031_f_2">The Taste of Fear (A Suspense Action...</a>
</div>
<div class="zg_byline">by Jeremy Bates</div>
<div class="zg_price">Free</div>
</div>
<div class="zg_itemWrapper" style="height:315px">
<div class="zg_title"><a href="http://www.amazon.in/Taste-Suspense-Action-Thriller-Mystery-ebook/dp/B00JKN41ZS/ref=zg_bs_1637004031_f_2">Another Book</a>
</div>
<div class="zg_byline">by Jeremy</div>
<div class="zg_price">Free</div>
</div>
我正在使用 BeautifulSoup 阅读网页并提取一些细节:
标题、作者、价格和Link
我试过的代码只能提取其中一个,但我希望每个标题都在 collection 中。
items = soup.find_all("div", {"class":"zg_itemWrapper"})
for item in items:
titles = item.find_all("div", {"class":"zg_title"})
for title in titles:
print title.text
你走在正确的轨道上。
使用 find
class 名称为每个 "itemWrapper" 找到:
items = soup.find_all("div", {"class":"zg_itemWrapper"})
for item in items:
title_elm = item.find("div", {"class":"zg_title"}).a
title = title_elm.get_text()
link = title_elm["href"]
author = item.find("div", {"class": "zg_byline"}).get_text()
price = item.find("div", {"class": "zg_price"}).get_text()
print title, link, author, price