使用 Python 中的硒从 <li> 项中提取文本
Extracting texts from <li> items with selenium in Python
我正在尝试获取嵌套 ul-li 结构中 /a 标记内的文本。我找到了所有 "li",但无法获取 a 中的文本。
我正在使用 Python 3.7 和带有 Firefox 驱动程序的 Selenium webdriver。
对应的HTML为:
[some HTML]
<ul class="dropdown-menu inner">
<!---->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option first-in-group group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 1</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 2</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 3</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT4</a>
</li>
[another 100 <li></li> similar blocks] .
.
<li class="no-search-result" placeholder="Curso">
<span>Unimportant TEXT</span>
</li>
</ul>
[more HTML]
我试过下面的代码:
cursos = browser.find_elements_by_xpath('//li[@nya-bs-option="curso in ctrl.cursos group by curso.grupo"]')
nome_curso = [curso.find_element_by_tag_name('a').text for curso in cursos]
我得到了包含正确数量项目的列表,但所有项目 = ''。谁能帮我?谢谢
看来你很接近。提取文本,例如重要文本 1、重要文本 2、重要文本 3、重要文本 4,等等你必须为所需的 visibility_of_all_elements_located()
引入 WebDriverWait 并且你可以使用以下任一 :
使用CSS_SELECTOR
和get_attribute()
方法:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.dropdown-menu.inner li.nya-bs-option a")))])
使用 XPATH
和 text
属性:
print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='dropdown-menu inner']//li[contains(@class, 'nya-bs-option')]//a")))])
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
结尾
根据文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium
我正在尝试获取嵌套 ul-li 结构中 /a 标记内的文本。我找到了所有 "li",但无法获取 a 中的文本。
我正在使用 Python 3.7 和带有 Firefox 驱动程序的 Selenium webdriver。
对应的HTML为:
[some HTML]
<ul class="dropdown-menu inner">
<!---->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option first-in-group group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 1</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 2</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT 3</a>
</li>
<!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
<li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
<span class="dropdown-header">Cursos em Destaque</span>
<a tabindex="0">Important TEXT4</a>
</li>
[another 100 <li></li> similar blocks] .
.
<li class="no-search-result" placeholder="Curso">
<span>Unimportant TEXT</span>
</li>
</ul>
[more HTML]
我试过下面的代码:
cursos = browser.find_elements_by_xpath('//li[@nya-bs-option="curso in ctrl.cursos group by curso.grupo"]')
nome_curso = [curso.find_element_by_tag_name('a').text for curso in cursos]
我得到了包含正确数量项目的列表,但所有项目 = ''。谁能帮我?谢谢
看来你很接近。提取文本,例如重要文本 1、重要文本 2、重要文本 3、重要文本 4,等等你必须为所需的 visibility_of_all_elements_located()
引入 WebDriverWait 并且你可以使用以下任一
使用
CSS_SELECTOR
和get_attribute()
方法:print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.dropdown-menu.inner li.nya-bs-option a")))])
使用
XPATH
和text
属性:print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='dropdown-menu inner']//li[contains(@class, 'nya-bs-option')]//a")))])
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
结尾
根据文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium