如何使用 BeautfulSoup4 在网站中获取 link of <h3>

Question

我目前正在编写一个脚本来获取网站 http://www.xetra.com/xetra-en/newsroom/xetra-newsboard 和 BeautifulSoup4 的所有内容。到目前为止，我已经设法在列表中获取所有公告：

gdata_even=soup.find_all("li", {"class":"list2Col even "})
gdata_odd=soup.find_all("li", {"class":"list2Col odd "})

但是我正在努力获取 href 中嵌入的 link (url)。 . . . . .

 <div class="contentCol">
                 <div class="categories">
                  Frankfurt
                 </div>
                 <h3>
                  <a href="/xetra-en/newsroom/xetra-newsboard/FRA-Deletion-of-Instruments-from-XETRA---24.08.2015-001/1909774">
                   FRA:Deletion of Instruments from XETRA - 24.08.2015-001
                  </a>
                 </h3>
                </div>

有人可以帮忙吗

谢谢

Answer 1

你可以试试这个

.find_all('a',href=True)[0]['href'] # first element

或

使用for loop

for i in soup.find_all('a',href=True):
    print i['href']

更新

for i in soup.find_all("div", attrs={"class" : "contentCol"}):
    for j in i.find_all("h3"):
        for k in j.find_all('a',href=True):
            print k['href']

Answer 2

itzmeontv 回答了你的问题，但对你的评论的回答是：

for matchDiv in soup.find_all("div", attrs={"class" : "contentCol"}):
    h3Url = matchDiv.find("a").get("href")

如何使用 BeautfulSoup4 在网站中获取 link of <h3>

How to get link of <h3> in a Website with BeautfulSoup4

python

beautifulsoup