将特定原始 table 的所有 <td> 与 Webdriver Selenium - Python 匹配
Match all <td> of a specific raw table with Webdriver Selenium - Python
我在网络抓取方面还是个新手,我有一个与 Webdriver 相关的问题。
代码示例:
<table>
<tbody>
<tr>
<td> car </td>
<td> bus </td>
</tr>
<tr>
<td> car </td>
<td> bus & train </td>
</tr>
<tr>
<td> car </td>
<td> bus & plane </td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td> food </td>
<td> meat</td>
</tr>
<tr>
<td> drink </td>
<td> water </td>
</tr>
</tbody>
</table>
所以我的想法是,在我的原始代码中,我有多个具有相同 ID 和 class 个名称的表。
问题:我怎样才能继续提取所有包含单词“bus[=27]的TR =]".
我找不到要使用的正确 xpath 语法。
使用beautifulsoup
html = "<table>
<tbody>
<tr>
<td> car </td>
<td> bus </td>
</tr>
<tr>
<td> car </td>
<td> bus & train </td>
</tr>
<tr>
<td> car </td>
<td> bus & plane </td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td> food </td>
<td> meat</td>
</tr>
<tr>
<td> drink </td>
<td> water </td>
</tr>
</tbody>
</table>"
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")
temp = soup.findAll("td")
output = [x for x in temp if "bus" in x.text]
//td[contains(text(),'bus')]
您可以使用 contains text ,这会给出所有包含总线的 td
要创建包含文本 bus 的所有 <tr>
及其 child <td>
的列表,您可以使用以下 xpath based :
elements = driver.find_elements_by_xpath("//tr[.//td[contains(., 'bus')]]")
理想情况下你需要诱导 for the visibility_of_all_elements_located()
and you can use either of the following :
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr[.//td[contains(., 'bus')]]")))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
我在网络抓取方面还是个新手,我有一个与 Webdriver 相关的问题。
代码示例:
<table>
<tbody>
<tr>
<td> car </td>
<td> bus </td>
</tr>
<tr>
<td> car </td>
<td> bus & train </td>
</tr>
<tr>
<td> car </td>
<td> bus & plane </td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td> food </td>
<td> meat</td>
</tr>
<tr>
<td> drink </td>
<td> water </td>
</tr>
</tbody>
</table>
所以我的想法是,在我的原始代码中,我有多个具有相同 ID 和 class 个名称的表。
问题:我怎样才能继续提取所有包含单词“bus[=27]的TR =]".
我找不到要使用的正确 xpath 语法。
使用beautifulsoup
html = "<table>
<tbody>
<tr>
<td> car </td>
<td> bus </td>
</tr>
<tr>
<td> car </td>
<td> bus & train </td>
</tr>
<tr>
<td> car </td>
<td> bus & plane </td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td> food </td>
<td> meat</td>
</tr>
<tr>
<td> drink </td>
<td> water </td>
</tr>
</tbody>
</table>"
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")
temp = soup.findAll("td")
output = [x for x in temp if "bus" in x.text]
//td[contains(text(),'bus')]
您可以使用 contains text ,这会给出所有包含总线的 td
要创建包含文本 bus 的所有 <tr>
及其 child <td>
的列表,您可以使用以下 xpath based
elements = driver.find_elements_by_xpath("//tr[.//td[contains(., 'bus')]]")
理想情况下你需要诱导 visibility_of_all_elements_located()
and you can use either of the following
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr[.//td[contains(., 'bus')]]")))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC