我正在努力用 Beautifulsoup 抓取正确的 URL

Question

我正在编写网络抓取工具，并且正在努力从网页中获取 href link。 URL 是网站以下部分的 https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php I am trying to get this href link: https://www.tesseratherapeutics.com

<a class="text-border-botton-color " target="_blank" href="https://www.tesseratherapeutics.com/">https://www.tesseratherapeutics.com/</a>

这是我的代码：

from cgi import print_directory
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re

URL = "https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []

for link in soup.findAll(class_="text-border-botton-color "):
    links.append(link.get("href"))
print(links)

当我运行我的代码时，我得到这个：

[]

谁能帮我找到正确的 href link？

谢谢！

Answer 1

@ggorlen 指出打字错误："text-border-botton-color" not "text-border-botton-color " 意味着您必须删除颜色后存在的额外 space。

from cgi import print_directory
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re

URL = "https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []

for link in soup.findAll(class_="text-border-botton-color"):
    links.append(link.get("href"))
print(links)

输出：

['https://www.tesseratherapeutics.com/']

我正在努力用 Beautifulsoup 抓取正确的 URL

I am struggling to scrape the correct URL with Beautifulsoup

html

python

beautifulsoup