Webscraping:在 BeautifulSoup 中使用 findAll 的问题
Webscraping: Issues with using findAll in BeautifulSoup
我正在尝试从该网站获取所有语言 https://lawyers.justia.com/lawyer/ali-shahrestani-esq-198352。
我的这行代码只给了我想要的一部分。
soup.findAll("div",{"class":"block-wrapper block"})
输出:'[英语:口语,书面]'
根据标签,我也试过了
soup.findAll("ul",{"class":"has-no-list-styles"})
输出:'Personal InjuryProducts LiabilityElder LawConsumer LawDUI & DWIEmployment Law'
我认为应该这样做:
from bs4 import BeautifulSoup as bs
url = 'https://lawyers.justia.com/lawyer/ali-shahrestani-esq-198352'
data = requests.get(url)
soup = bs(data.text,'lxml')
target = soup.find_all("div",{"class":"heading-3 block-title iconed-heading font-w-bold"})
for t in target:
if t.find('span', class_="jicon -large jicon-languages"):
langs = t.find_next_sibling()
for lang in langs.find_all('li'):
print(lang.text)
输出:
English: Spoken, Written
French: Spoken, Written
Italian: Spoken, Written
Persian: Spoken
Spanish: Spoken, Written
我正在尝试从该网站获取所有语言 https://lawyers.justia.com/lawyer/ali-shahrestani-esq-198352。
我的这行代码只给了我想要的一部分。
soup.findAll("div",{"class":"block-wrapper block"})
输出:'[英语:口语,书面]'
根据标签,我也试过了
soup.findAll("ul",{"class":"has-no-list-styles"})
输出:'Personal InjuryProducts LiabilityElder LawConsumer LawDUI & DWIEmployment Law'
我认为应该这样做:
from bs4 import BeautifulSoup as bs
url = 'https://lawyers.justia.com/lawyer/ali-shahrestani-esq-198352'
data = requests.get(url)
soup = bs(data.text,'lxml')
target = soup.find_all("div",{"class":"heading-3 block-title iconed-heading font-w-bold"})
for t in target:
if t.find('span', class_="jicon -large jicon-languages"):
langs = t.find_next_sibling()
for lang in langs.find_all('li'):
print(lang.text)
输出:
English: Spoken, Written
French: Spoken, Written
Italian: Spoken, Written
Persian: Spoken
Spanish: Spoken, Written