如何使用 python 抓取超链接的 name/text?
How do I scrape the name/text of the hyperlinks using python?
我想从此 URL https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html 中提取链接的名称,但是,我无法继续下一步。下面是我目前的代码
import requests as re
from bs4 import BeautifulSoup
URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(class_="post altr")
for result in results:
print(result)
我仍然不知道如何进行下一步。很感谢任何形式的帮助。谢谢。
此代码获取页面中 link 的所有文本:
import requests as re
from bs4 import BeautifulSoup
URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all('a')
for result in results:
print(result.text.strip())
输出:
CCDA
port channels
RPVST
Dynamic Trunking Protocol
VTP transparent mode
Layer 3 load balancing
user ports
enable PortFast
the core layer
link redundancy
access layer switches
Gateway Load Balancing Protocol
core switches
distribution switches
redundant paths
campus core
Large Building LANs
LAN Design Types and Models
Shutting Down a BGP Neighbor
Core Layer Functionality - Network Design
Distribution Layer Functionality
Characterizing Types of Traffic Flow for New Network Applications
DHCP Starvation and Spoofing Attacks
How to Start an Ecommerce Business
Reply
About
Contact
Advertise
Privacy Policy
Resources
之所以有效,是因为为了在 html 中创建一个 hyperlink,使用了标签 。我相信您要的是恰好有 hyperlink 的文本块,但如果您要的是 link,请按以下方法操作:
import requests as re
from bs4 import BeautifulSoup
URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
for a in soup.find_all('a', href=True):
print(a['href'])
输出:
/
/reviews/traffic-xtractor.html
/ccda/
/routing-switching/using-routed-ports-and-portchannels-with-mls.html
/root-bridge/rapid-pervlan-spanning-tree-protocol.html
/network-security-2/dynamic-trunking-protocol-dtp.html
/root-bridge/vtp-modes.html
/root-bridge/configuring-etherchannel-load-balancing.html
/routing-switching-2/switch-security-best-practices-for-unused-and-user-ports.html
/global-configuration/enabling-bpdu-guard.html
/network-design/core-layer-functionality.html
/network-design/designing-link-redundancy.html
/network-design/access-layer-functionality.html
/root-bridge/gateway-load-balancing-protocol.html
/switching/collapsed-core.html
/switching/distribution-layer-switches.html
/switching/backbonefast-redundant-backbone-paths.html
/network-design/campus-core-design-considerations.html
/ccda/largebuilding-lans.html
/ccda/lan-design-types-and-models.html
/cisco-internetworks-2/shutting-down-a-bgp-neighbor.html
/network-design/core-layer-functionality.html
/network-design/distribution-layer-functionality.html
/network-design-2/characterizing-types-of-traffic-flow-for-new-network-applications.html
/snrs-3/dhcp-starvation-and-spoofing-attacks.html
/ecommerce.html
/about/
/contact/
/advertise-with-us/
/privacy-policy/
/resources/
这只擦除每个标签的 'href'。
我想从此 URL https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html 中提取链接的名称,但是,我无法继续下一步。下面是我目前的代码
import requests as re
from bs4 import BeautifulSoup
URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(class_="post altr")
for result in results:
print(result)
我仍然不知道如何进行下一步。很感谢任何形式的帮助。谢谢。
此代码获取页面中 link 的所有文本:
import requests as re
from bs4 import BeautifulSoup
URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all('a')
for result in results:
print(result.text.strip())
输出:
CCDA
port channels
RPVST
Dynamic Trunking Protocol
VTP transparent mode
Layer 3 load balancing
user ports
enable PortFast
the core layer
link redundancy
access layer switches
Gateway Load Balancing Protocol
core switches
distribution switches
redundant paths
campus core
Large Building LANs
LAN Design Types and Models
Shutting Down a BGP Neighbor
Core Layer Functionality - Network Design
Distribution Layer Functionality
Characterizing Types of Traffic Flow for New Network Applications
DHCP Starvation and Spoofing Attacks
How to Start an Ecommerce Business
Reply
About
Contact
Advertise
Privacy Policy
Resources
之所以有效,是因为为了在 html 中创建一个 hyperlink,使用了标签 。我相信您要的是恰好有 hyperlink 的文本块,但如果您要的是 link,请按以下方法操作:
import requests as re
from bs4 import BeautifulSoup
URL = "https://www.ccexpert.us/ccda/best-practices-for-hierarchical-layers.html"
page = re.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
for a in soup.find_all('a', href=True):
print(a['href'])
输出:
/
/reviews/traffic-xtractor.html
/ccda/
/routing-switching/using-routed-ports-and-portchannels-with-mls.html
/root-bridge/rapid-pervlan-spanning-tree-protocol.html
/network-security-2/dynamic-trunking-protocol-dtp.html
/root-bridge/vtp-modes.html
/root-bridge/configuring-etherchannel-load-balancing.html
/routing-switching-2/switch-security-best-practices-for-unused-and-user-ports.html
/global-configuration/enabling-bpdu-guard.html
/network-design/core-layer-functionality.html
/network-design/designing-link-redundancy.html
/network-design/access-layer-functionality.html
/root-bridge/gateway-load-balancing-protocol.html
/switching/collapsed-core.html
/switching/distribution-layer-switches.html
/switching/backbonefast-redundant-backbone-paths.html
/network-design/campus-core-design-considerations.html
/ccda/largebuilding-lans.html
/ccda/lan-design-types-and-models.html
/cisco-internetworks-2/shutting-down-a-bgp-neighbor.html
/network-design/core-layer-functionality.html
/network-design/distribution-layer-functionality.html
/network-design-2/characterizing-types-of-traffic-flow-for-new-network-applications.html
/snrs-3/dhcp-starvation-and-spoofing-attacks.html
/ecommerce.html
/about/
/contact/
/advertise-with-us/
/privacy-policy/
/resources/
这只擦除每个标签的 'href'。