从美丽的汤输出中仅提取图像链接

Question

我是 BeautifulSoup 的新手，我一直在尝试使用 bs4 和请求从网页中提取每张图片 link。但是，当我尝试打印每个图像 link 时，它会输出 html 而不是直接 link 到任何图像。

我试过从使用 'find' 切换到使用 'findAll'，但这仍然没有解决我的问题。

import bs4
import requests

req = requests.get('https://www.gnu.org/home.en.html')

soup = bs4.BeautifulSoup(req.text, features='html.parser')

html = (soup.findAll('img'))

print(html)

我希望输出是网络 url，例如 https://www.gnu.org/distros/screenshots/guixSD-gnome3-medium.jpg，但输出却给我 html，看起来像这样。

[<img alt=" [A GNU head] " src="/graphics/heckert_gnu.transp.small.png"/>,

Answer 1

相对的 link 可以从 src 属性中获取。您可以使用：

for im in html:
    print(im['src'])

然后，与基础URL连接，你可以得到完整的URL。

从美丽的汤输出中仅提取图像链接

Extract only image links from beautiful soup output

python

beautifulsoup

html-parsing

python-requests