如何使用 python 中的 BeautifulSoup 获取第二个跨度?
How can I get the second span using BeautifulSoup in python?
我正在尝试获取此 div 和其他类似内容(如下所示)中的第二个跨度值
<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
<span>VALUE 1</span>
<i aria-hidden="true" class="Mx(4px)">•</i>
<span>TRYING TO GET THIS</span>
</div>
我已经尝试查看类似的堆栈帖子,但我仍然无法弄清楚如何解决这个问题。
这是我当前的代码:
time = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
for i in time:
print(i.text) #this prints VALUE 1 x amount of times (there are multiple divs)
我试过 i.span、i.contents、i.children 等。
非常感谢任何帮助,谢谢!
试试这个
from io import StringIO
from bs4 import BeautifulSoup as bs
data = """<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
<span>VALUE 1</span>
<i aria-hidden="true" class="Mx(4px)">•</i>
<span>TRYING TO GET THIS</span>
</div>
<div class="another class">
<span>VALUE 1</span>
<i aria-hidden="true" class="Mx(4px)">•</i>
<span>TRYING TO GET THIS</span>
</div>"""
soup = bs(StringIO(data))
spans = soup.select('div[class="C(#959595) Fz(11px) D(ib) Mb(6px)"] > span')
print(spans[1].text)
你基本上已经做到了,你只需要在每个 div (find_next):
中获得第二个跨度
soup = BeautifulSoup(HTML, 'html.parser')
divs = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
for div in divs:
# want the second span in the div
span = div.find_next('span').find_next('span')
print(span.string)
div= soup.find_all('div',class_='C(#959595) Fz(11px) D(ib) Mb(6px)')
[x.get_text() for x in div[0].find_all('span')]
#op
Out[17]:
['VALUE 1', 'TRYING TO GET THIS']
有几种方法可以获得您想要的值。
from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''
<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
<span>VALUE 1</span>
<i aria-hidden="true" class="Mx(4px)">•</i>
<span>TRYING TO GET THIS</span>
</div>
'''
doc = SimplifiedDoc(html)
divs = doc.getElementsByClass('C(#959595) Fz(11px) D(ib) Mb(6px)')
for div in divs:
value = div.getElementByTag('span',start='</span>') # Use start to skip the first
print (value)
value = div.getElementByTag('span',before='<span>',end=len(div.html)) # Locate the last
print (value)
value = div.i.next # Use <i> to locate
print (value)
value = div.spans[-1]
print (value)
print (value.text)
结果:
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
TRYING TO GET THIS
我正在尝试获取此 div 和其他类似内容(如下所示)中的第二个跨度值
<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
<span>VALUE 1</span>
<i aria-hidden="true" class="Mx(4px)">•</i>
<span>TRYING TO GET THIS</span>
</div>
我已经尝试查看类似的堆栈帖子,但我仍然无法弄清楚如何解决这个问题。 这是我当前的代码:
time = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
for i in time:
print(i.text) #this prints VALUE 1 x amount of times (there are multiple divs)
我试过 i.span、i.contents、i.children 等。 非常感谢任何帮助,谢谢!
试试这个
from io import StringIO
from bs4 import BeautifulSoup as bs
data = """<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
<span>VALUE 1</span>
<i aria-hidden="true" class="Mx(4px)">•</i>
<span>TRYING TO GET THIS</span>
</div>
<div class="another class">
<span>VALUE 1</span>
<i aria-hidden="true" class="Mx(4px)">•</i>
<span>TRYING TO GET THIS</span>
</div>"""
soup = bs(StringIO(data))
spans = soup.select('div[class="C(#959595) Fz(11px) D(ib) Mb(6px)"] > span')
print(spans[1].text)
你基本上已经做到了,你只需要在每个 div (find_next):
中获得第二个跨度soup = BeautifulSoup(HTML, 'html.parser')
divs = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
for div in divs:
# want the second span in the div
span = div.find_next('span').find_next('span')
print(span.string)
div= soup.find_all('div',class_='C(#959595) Fz(11px) D(ib) Mb(6px)')
[x.get_text() for x in div[0].find_all('span')]
#op
Out[17]:
['VALUE 1', 'TRYING TO GET THIS']
有几种方法可以获得您想要的值。
from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''
<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
<span>VALUE 1</span>
<i aria-hidden="true" class="Mx(4px)">•</i>
<span>TRYING TO GET THIS</span>
</div>
'''
doc = SimplifiedDoc(html)
divs = doc.getElementsByClass('C(#959595) Fz(11px) D(ib) Mb(6px)')
for div in divs:
value = div.getElementByTag('span',start='</span>') # Use start to skip the first
print (value)
value = div.getElementByTag('span',before='<span>',end=len(div.html)) # Locate the last
print (value)
value = div.i.next # Use <i> to locate
print (value)
value = div.spans[-1]
print (value)
print (value.text)
结果:
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
{'tag': 'span', 'html': 'TRYING TO GET THIS'}
TRYING TO GET THIS