find previous method to find url before specific class 之前的方法
find previous method to find url before specific class
我想找到 class= 'field-news-pillars' 之前的上一个标签
html 看起来像这样:
<span class="field-content"><a href="/news/building-business">Building a Business</a></span>
<div> <div>
<span class="date-display-single">Jun 29, 2020</span></div> </div>
<div> <div>
<div class="field-news-pillars">
Entrepreneurial Spirit </div>
</div> </div>
我想获取 class="field-news-pillars" == 'Entrepreneurial Spirit'
上方的 href link
我知道有更简单的方法可以从 html 中找到 href link,但我正在尝试过滤所有 link,我只想select links 是 class="field-news-pillars" 的前一个标签。这是我试过的。
last_link = soup.find(class_='field-news-pillars', text ='Entrepreneurial Spirit' )
print(last_link.find_previous('a')['href'])
error: AttributeError: 'NoneType' object has no attribute 'find_previous'
有什么想法吗?谢谢!
find
需要标签名称。
In [174]: html = """<span class="field-content"><a href="/news/building-business">Building a Business</a></span>
...: <div> <div>
...: <span class="date-display-single">Jun 29, 2020</span></div> </div>
...: <div> <div>
...: <div class="field-news-pillars">
...: Entrepreneurial Spirit </div>
...: </div> </div>"""
In [175]: soup = BeautifulSoup(html, "html.parser")
In [176]: last_link = soup.find("div", class_='field-news-pillars')
In [177]: print(last_link.find_previous('a')['href'])
/news/building-business
如果要按文本过滤,
n [189]: import re
In [190]: last_link = soup.find("div", class_='field-news-pillars', text=re.compile('Entrepreneurial Spirit*'))
In [191]: print(last_link.find_previous('a')['href'])
/news/building-business
如果您有 BS4 4.7.1 或更高版本而不使用正则表达式,则使用 css 选择器获取值的另一种方法。
html='''<span class="field-content"><a href="/news/building-business">Building a Business</a></span>
<div> <div>
<span class="date-display-single">Jun 29, 2020</span></div> </div>
<div> <div>
<div class="field-news-pillars">
Entrepreneurial Spirit </div>
</div> </div>'''
soup=BeautifulSoup(html,'html.parser')
item1=soup.select_one('.field-news-pillars:contains("Entrepreneurial Spirit")')
print(item1.find_previous('a')['href'])
我想找到 class= 'field-news-pillars' 之前的上一个标签 html 看起来像这样:
<span class="field-content"><a href="/news/building-business">Building a Business</a></span>
<div> <div>
<span class="date-display-single">Jun 29, 2020</span></div> </div>
<div> <div>
<div class="field-news-pillars">
Entrepreneurial Spirit </div>
</div> </div>
我想获取 class="field-news-pillars" == 'Entrepreneurial Spirit'
上方的 href link我知道有更简单的方法可以从 html 中找到 href link,但我正在尝试过滤所有 link,我只想select links 是 class="field-news-pillars" 的前一个标签。这是我试过的。
last_link = soup.find(class_='field-news-pillars', text ='Entrepreneurial Spirit' )
print(last_link.find_previous('a')['href'])
error: AttributeError: 'NoneType' object has no attribute 'find_previous'
有什么想法吗?谢谢!
find
需要标签名称。
In [174]: html = """<span class="field-content"><a href="/news/building-business">Building a Business</a></span>
...: <div> <div>
...: <span class="date-display-single">Jun 29, 2020</span></div> </div>
...: <div> <div>
...: <div class="field-news-pillars">
...: Entrepreneurial Spirit </div>
...: </div> </div>"""
In [175]: soup = BeautifulSoup(html, "html.parser")
In [176]: last_link = soup.find("div", class_='field-news-pillars')
In [177]: print(last_link.find_previous('a')['href'])
/news/building-business
如果要按文本过滤,
n [189]: import re
In [190]: last_link = soup.find("div", class_='field-news-pillars', text=re.compile('Entrepreneurial Spirit*'))
In [191]: print(last_link.find_previous('a')['href'])
/news/building-business
如果您有 BS4 4.7.1 或更高版本而不使用正则表达式,则使用 css 选择器获取值的另一种方法。
html='''<span class="field-content"><a href="/news/building-business">Building a Business</a></span>
<div> <div>
<span class="date-display-single">Jun 29, 2020</span></div> </div>
<div> <div>
<div class="field-news-pillars">
Entrepreneurial Spirit </div>
</div> </div>'''
soup=BeautifulSoup(html,'html.parser')
item1=soup.select_one('.field-news-pillars:contains("Entrepreneurial Spirit")')
print(item1.find_previous('a')['href'])