Python：从相同的 html 格式中仅抓取一个 url

Question

html:

 <li class="dropdown menu-large menu_index_link"><a href="/MainPage" title="A">A</a></li>
 <li class="dropdown menu-large menu_index_link"><a href="/apple" title="1">1</a></li>

它们具有相同的 html 格式，但我只需要第二个，我该怎么办？或许用标题来区分？

代码：

for item in soup.find_all(attrs={'class':'dropdown menu-large menu_index_link'}):
    for link in item.find_all('a'):
        href=link.get('href')   #print out both of the link

问题解决如下：

for item in soup.find_all(attrs={'class':'dropdown menu-large menu_index_link'}):
        for link in item.find_all('a', {'title': "1"}):
            href=link.get('href')   #print out the link I want

Answer 1

我发现 a 标签的 title 属性不同。您可以 select 通过在您的 find_all.

中包含标题过滤器来 select 所需的项目

item.find_all('a', {'title': "1"})

Python：从相同的 html 格式中仅抓取一个 url

Python: scrape only one url from the same html format

html

python

scrape