使用 selenium (python) 获取 href link
Get href link with selenium (python)
我想从子元素中获取所有 href。
parent class 是 search-content
它有 parent divs card-col
并且在这些 divs 中还有另外 1 div 然后是 href。我只想得到这个 href link
这是我的代码-->
el=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CLASS_NAME, "search-content-cards")))
el_hrefs=el.find_elements_by_xpath(".//a[@href]")
for i in el_hrefs:
print(i)
输出很多元素
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="f7c84bf8-c20c-4b70-8ba5-414e822bba21")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="1e078e8b-104f-4299-94b1-8741cf30f047")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="d8b4b5e0-6291-4fd2-ae04-faee245462d1")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="ef06e8ac-321c-40db-9f6c-40dd3a3b07de")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="e14cf667-1bf4-434c-b9a2-1c4f362398d2")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="8e549221-eca4-41cf-943d-3cb0f6f75d50")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="afd597fb-1bb0-48fb-8646-6c43cb17ab38")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="8f3a655e-d3cd-4748-934a-2c9000481ed3")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="a1706e30-fad0-4799-871f-c5a928c69009")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="156847c9-5d7f-4963-82fa-baaf2b8f6e7f")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="99c320b4-f6f1-4eb4-abec-4be4df790b71")>
谁能帮帮我?
i
在你的例子中是 web element
,要提取 .text
,你不应该只打印 i
,它应该是 print(i.text)
.
此外,如果您想从 a tag
中提取 href
,那么您应该使用 .get_attribute('href')
其次,我认为你应该使用CSS_SELECTOR
div.search-content-cards
而不是CLASS_NAME
另外一个标签是后代。
所以你的有效代码应该是这样的:
el = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.search-content-cards")))
el_hrefs = el.find_elements_by_xpath(".//descendant::a[@href]")
for i in el_hrefs:
print(i.get_attribute('href'))
要提取 href 属性的值而不是 you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following :
使用CSS_SELECTOR:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.search-content-cards a.d-block")))])
使用 XPATH:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'search-content-cards')]//a[contains(@class, 'd-block')]")))])
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
我想从子元素中获取所有 href。
parent class 是 search-content
它有 parent divs card-col
并且在这些 divs 中还有另外 1 div 然后是 href。我只想得到这个 href link
这是我的代码-->
el=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CLASS_NAME, "search-content-cards")))
el_hrefs=el.find_elements_by_xpath(".//a[@href]")
for i in el_hrefs:
print(i)
输出很多元素
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="f7c84bf8-c20c-4b70-8ba5-414e822bba21")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="1e078e8b-104f-4299-94b1-8741cf30f047")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="d8b4b5e0-6291-4fd2-ae04-faee245462d1")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="ef06e8ac-321c-40db-9f6c-40dd3a3b07de")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="e14cf667-1bf4-434c-b9a2-1c4f362398d2")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="8e549221-eca4-41cf-943d-3cb0f6f75d50")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="afd597fb-1bb0-48fb-8646-6c43cb17ab38")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="8f3a655e-d3cd-4748-934a-2c9000481ed3")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="a1706e30-fad0-4799-871f-c5a928c69009")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="156847c9-5d7f-4963-82fa-baaf2b8f6e7f")>
<selenium.webdriver.remote.webelement.WebElement (session="0a4b52d1575e427e34d6b790a284c501", element="99c320b4-f6f1-4eb4-abec-4be4df790b71")>
谁能帮帮我?
i
在你的例子中是 web element
,要提取 .text
,你不应该只打印 i
,它应该是 print(i.text)
.
此外,如果您想从 a tag
中提取 href
,那么您应该使用 .get_attribute('href')
其次,我认为你应该使用CSS_SELECTOR
div.search-content-cards
而不是CLASS_NAME
另外一个标签是后代。
所以你的有效代码应该是这样的:
el = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.search-content-cards")))
el_hrefs = el.find_elements_by_xpath(".//descendant::a[@href]")
for i in el_hrefs:
print(i.get_attribute('href'))
要提取 href 属性的值而不是
使用CSS_SELECTOR:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.search-content-cards a.d-block")))])
使用 XPATH:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'search-content-cards')]//a[contains(@class, 'd-block')]")))])
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC