我正在努力用 Beautifulsoup 抓取正确的 URL
I am struggling to scrape the correct URL with Beautifulsoup
我正在编写网络抓取工具,并且正在努力从网页中获取 href link。 URL 是网站以下部分的 https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php I am trying to get this href link: https://www.tesseratherapeutics.com
<a class="text-border-botton-color " target="_blank" href="https://www.tesseratherapeutics.com/">https://www.tesseratherapeutics.com/</a>
这是我的代码:
from cgi import print_directory
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re
URL = "https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []
for link in soup.findAll(class_="text-border-botton-color "):
links.append(link.get("href"))
print(links)
当我 运行 我的代码时,我得到这个:
[]
谁能帮我找到正确的 href link?
谢谢!
@ggorlen 指出打字错误:"text-border-botton-color"
not "text-border-botton-color "
意味着您必须删除颜色后存在的额外 space。
from cgi import print_directory
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re
URL = "https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []
for link in soup.findAll(class_="text-border-botton-color"):
links.append(link.get("href"))
print(links)
输出:
['https://www.tesseratherapeutics.com/']
我正在编写网络抓取工具,并且正在努力从网页中获取 href link。 URL 是网站以下部分的 https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php I am trying to get this href link: https://www.tesseratherapeutics.com
<a class="text-border-botton-color " target="_blank" href="https://www.tesseratherapeutics.com/">https://www.tesseratherapeutics.com/</a>
这是我的代码:
from cgi import print_directory
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re
URL = "https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []
for link in soup.findAll(class_="text-border-botton-color "):
links.append(link.get("href"))
print(links)
当我 运行 我的代码时,我得到这个:
[]
谁能帮我找到正确的 href link?
谢谢!
@ggorlen 指出打字错误:"text-border-botton-color"
not "text-border-botton-color "
意味着您必须删除颜色后存在的额外 space。
from cgi import print_directory
import pandas as pd
import os
import requests
from bs4 import BeautifulSoup
import re
URL = "https://vcnewsdaily.com/Tessera%20Therapeutics/venture-funding.php"
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
links = []
for link in soup.findAll(class_="text-border-botton-color"):
links.append(link.get("href"))
print(links)
输出:
['https://www.tesseratherapeutics.com/']