Selenium WebDriver get_attribute returns 当值具有实体时截断 href 属性的值
Selenium WebDriver get_attribute returns truncated value of href attribute when value has entities
我正在尝试使用 selenium Webdriver (Python) 从我的应用程序页面上的锚选项卡获取 href 属性值,返回的结果已被部分删除。
这是 HTML 片段 -
<a class="nla-row-text" href="/shopping/brands?search=kamera&nm=Canon&page=0" data-reactid="790">
这是我使用的代码-
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Firefox()
driver.get("xxxx")
url_from_attr = driver.find_element(By.XPATH,"(//div[@class='nla-children mfr']/div/div/a)[1]").get_attribute("href")
url_from_attr_raw = "%r"%url_from_attr
print(" URL from attribute -->> " + url_from_attr)
print(" Raw string -->> " + url_from_attr_raw)
我得到的输出是 -
/shopping/brands?search=kamera&page=0
而不是-
/shopping/brands?search=kamera&nm=Canon&page=0 OR
/shopping/brands?search=kamera&nm=Canon&page=0
这是因为 URL 中的实体表示,因为我看到实体之间的部分被剥离了吗?任何帮助或指示都会很棒
根据给定的 HTML,您尝试过的 定位器策略 存在问题。您使用了索引 [1]
和 find_element
,即 error-prone。索引例如当通过 find_elements
返回 List 时,可以应用 [1]
。在此用例中,优化的表达式为:
url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']/div/div/a[@class='nla-row-text']").get_attribute("href")
定位器策略可以进一步优化如下:
url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text']").get_attribute("href")
更新 A
根据您的评论,您仍然需要使用索引优化的 Locator Strategy 可以是:
url_from_attr = driver.find_elements(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text'][1]").get_attribute("href")
get_attribute(attribute_name)
根据 Python-API Source :
def get_attribute(self, name):
"""Gets the given attribute or property of the element.
This method will first try to return the value of a property with the
given name. If a property with that name doesn't exist, it returns the
value of the attribute with the same name. If there's no attribute with
that name, ``None`` is returned.
Values which are considered truthy, that is equals "true" or "false",
are returned as booleans. All other non-``None`` values are returned
as strings. For attributes or properties which do not exist, ``None``
is returned.
:Args:
- name - Name of the attribute/property to retrieve.
Example::
# Check if the "active" CSS class is applied to an element.
is_active = "active" in target_element.get_attribute("class")
"""
attributeValue = ''
if self._w3c:
attributeValue = self.parent.execute_script(
"return (%s).apply(null, arguments);" % getAttribute_js,
self, name)
else:
resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
attributeValue = resp.get('value')
if attributeValue is not None:
if name != 'value' and attributeValue.lower() in ('true', 'false'):
attributeValue = attributeValue.lower()
return attributeValue
更新 B
正如您在评论中提到的那样 该方法返回的 url 值在页面上的任何地方都不存在 这意味着您正在尝试访问 href属性太早了。所以可以有如下2种解决方案:
遍历DOM树并构造一个Locator,它将唯一标识元素并归纳WebDriverwait with expected_conditions as element_to_be_clickable
然后提取 href 属性。
出于调试目的,您可以添加 time.sleep(10)
以使元素在 HTML DOM 中正确呈现然后尝试提取 href 属性。
我正在尝试使用 selenium Webdriver (Python) 从我的应用程序页面上的锚选项卡获取 href 属性值,返回的结果已被部分删除。
这是 HTML 片段 -
<a class="nla-row-text" href="/shopping/brands?search=kamera&nm=Canon&page=0" data-reactid="790">
这是我使用的代码-
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Firefox()
driver.get("xxxx")
url_from_attr = driver.find_element(By.XPATH,"(//div[@class='nla-children mfr']/div/div/a)[1]").get_attribute("href")
url_from_attr_raw = "%r"%url_from_attr
print(" URL from attribute -->> " + url_from_attr)
print(" Raw string -->> " + url_from_attr_raw)
我得到的输出是 -
/shopping/brands?search=kamera&page=0
而不是-
/shopping/brands?search=kamera&nm=Canon&page=0 OR
/shopping/brands?search=kamera&nm=Canon&page=0
这是因为 URL 中的实体表示,因为我看到实体之间的部分被剥离了吗?任何帮助或指示都会很棒
根据给定的 HTML,您尝试过的 定位器策略 存在问题。您使用了索引 [1]
和 find_element
,即 error-prone。索引例如当通过 find_elements
返回 List 时,可以应用 [1]
。在此用例中,优化的表达式为:
url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']/div/div/a[@class='nla-row-text']").get_attribute("href")
定位器策略可以进一步优化如下:
url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text']").get_attribute("href")
更新 A
根据您的评论,您仍然需要使用索引优化的 Locator Strategy 可以是:
url_from_attr = driver.find_elements(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text'][1]").get_attribute("href")
get_attribute(attribute_name)
根据 Python-API Source :
def get_attribute(self, name):
"""Gets the given attribute or property of the element.
This method will first try to return the value of a property with the
given name. If a property with that name doesn't exist, it returns the
value of the attribute with the same name. If there's no attribute with
that name, ``None`` is returned.
Values which are considered truthy, that is equals "true" or "false",
are returned as booleans. All other non-``None`` values are returned
as strings. For attributes or properties which do not exist, ``None``
is returned.
:Args:
- name - Name of the attribute/property to retrieve.
Example::
# Check if the "active" CSS class is applied to an element.
is_active = "active" in target_element.get_attribute("class")
"""
attributeValue = ''
if self._w3c:
attributeValue = self.parent.execute_script(
"return (%s).apply(null, arguments);" % getAttribute_js,
self, name)
else:
resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
attributeValue = resp.get('value')
if attributeValue is not None:
if name != 'value' and attributeValue.lower() in ('true', 'false'):
attributeValue = attributeValue.lower()
return attributeValue
更新 B
正如您在评论中提到的那样 该方法返回的 url 值在页面上的任何地方都不存在 这意味着您正在尝试访问 href属性太早了。所以可以有如下2种解决方案:
遍历DOM树并构造一个Locator,它将唯一标识元素并归纳WebDriverwait with expected_conditions as
element_to_be_clickable
然后提取 href 属性。出于调试目的,您可以添加
time.sleep(10)
以使元素在 HTML DOM 中正确呈现然后尝试提取 href 属性。