如何提取 <i class> 标签后的文本?
How do I extract text after <i class> tag?
我正在尝试使用 beautifulSoup 从 div class 打印文本 'Dealer',但我不知道如何提取它。
我尝试打印 i class,但是 Dealer 文本没有打印出来
url = 'https://www.carlist.my/used-cars-for-sale/proton/malaysia'
response = requests.get(url, params={'page_number': 1})
soup = BeautifulSoup(response.text, 'lxml')
articles = soup.find_all('article')[:25]
seller_type = articles[4].find('div', class_ = 'item push-quarter--ends listing__spec--dealer')
seller_type_text = articles[4].find('i', class_ = 'icon icon--secondary muted valign--top push-quarter--right icon--user-formal')
print(seller_type.prettify())
print()
print(seller_type_text)
这是我得到的输出:
<div class="item push-quarter--ends listing__spec--dealer">
<i class="icon icon--secondary muted valign--top push-quarter--right icon--user-formal">
</i>
Dealer
<span class="flyout listing__badge listing__badge--trusted-seller inline--block valign--top push-quarter--left">
<i class="icon icon--thumb-up">
</i>
<span class="flyout__content flyout__content--tip visuallyhidden--portable">
This 'Trusted Dealer' has a proven track record of upholding the best car selling practices certified by Carlist.my
</span>
</span>
<!-- used car -->
<!-- BMW -->
</div>
<i class="icon icon--secondary muted valign--top push-quarter--right icon--user-formal"></i>
如何在 i class 之后和跨度 class 之前打印单词 'Dealer'?
有人可以帮助我吗?
非常感谢!
看看你的 seller_type 的内容 属性。您会看到 Dealer 在 seller_type.contents[2]。也就是说,
import requests
from bs4 import BeautifulSoup
url = 'https://www.carlist.my/used-cars-for-sale/proton/malaysia?profile_type=Dealer'
response = requests.get(url, params={'page_number': 1})
soup = BeautifulSoup(response.text, 'lxml')
articles = soup.find_all('article')[:25]
seller_type = articles[4].find('div', class_ = 'item push-quarter--ends listing__spec--dealer')
print(seller_type.contents[2])
import requests
from bs4 import BeautifulSoup
url = 'https://www.carlist.my/used-cars-for-sale/proton/malaysia?profile_type=Dealer'
response = requests.get(url, params={'page_number': 1})
soup = BeautifulSoup(response.text, 'lxml')
articles = soup.find_all('article')[:25]
seller_type = articles[4].find('div', class_ = 'item push-quarter--ends listing__spec--dealer')
print(seller_type.contents[2])
使用 i
标签元素的复合 class 名称之一和 next_sibling
.
是一种更快的方法
如果您检查 html,您会发现 "Dealer" 是 i
标签的父 div
的一部分,并且跟在 i
标签之后;所以,您可以获取 i
标签,然后使用 next_sibling
from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://www.carlist.my/used-cars-for-sale/proton/malaysia')
soup = bs(r.content, 'lxml')
print(soup.select_one('.icon--user-formal').next_sibling)
我正在尝试使用 beautifulSoup 从 div class 打印文本 'Dealer',但我不知道如何提取它。
我尝试打印 i class,但是 Dealer 文本没有打印出来
url = 'https://www.carlist.my/used-cars-for-sale/proton/malaysia'
response = requests.get(url, params={'page_number': 1})
soup = BeautifulSoup(response.text, 'lxml')
articles = soup.find_all('article')[:25]
seller_type = articles[4].find('div', class_ = 'item push-quarter--ends listing__spec--dealer')
seller_type_text = articles[4].find('i', class_ = 'icon icon--secondary muted valign--top push-quarter--right icon--user-formal')
print(seller_type.prettify())
print()
print(seller_type_text)
这是我得到的输出:
<div class="item push-quarter--ends listing__spec--dealer">
<i class="icon icon--secondary muted valign--top push-quarter--right icon--user-formal">
</i>
Dealer
<span class="flyout listing__badge listing__badge--trusted-seller inline--block valign--top push-quarter--left">
<i class="icon icon--thumb-up">
</i>
<span class="flyout__content flyout__content--tip visuallyhidden--portable">
This 'Trusted Dealer' has a proven track record of upholding the best car selling practices certified by Carlist.my
</span>
</span>
<!-- used car -->
<!-- BMW -->
</div>
<i class="icon icon--secondary muted valign--top push-quarter--right icon--user-formal"></i>
如何在 i class 之后和跨度 class 之前打印单词 'Dealer'?
有人可以帮助我吗?
非常感谢!
看看你的 seller_type 的内容 属性。您会看到 Dealer 在 seller_type.contents[2]。也就是说,
import requests
from bs4 import BeautifulSoup
url = 'https://www.carlist.my/used-cars-for-sale/proton/malaysia?profile_type=Dealer'
response = requests.get(url, params={'page_number': 1})
soup = BeautifulSoup(response.text, 'lxml')
articles = soup.find_all('article')[:25]
seller_type = articles[4].find('div', class_ = 'item push-quarter--ends listing__spec--dealer')
print(seller_type.contents[2])
import requests
from bs4 import BeautifulSoup
url = 'https://www.carlist.my/used-cars-for-sale/proton/malaysia?profile_type=Dealer'
response = requests.get(url, params={'page_number': 1})
soup = BeautifulSoup(response.text, 'lxml')
articles = soup.find_all('article')[:25]
seller_type = articles[4].find('div', class_ = 'item push-quarter--ends listing__spec--dealer')
print(seller_type.contents[2])
使用 i
标签元素的复合 class 名称之一和 next_sibling
.
如果您检查 html,您会发现 "Dealer" 是 i
标签的父 div
的一部分,并且跟在 i
标签之后;所以,您可以获取 i
标签,然后使用 next_sibling
from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://www.carlist.my/used-cars-for-sale/proton/malaysia')
soup = bs(r.content, 'lxml')
print(soup.select_one('.icon--user-formal').next_sibling)