如何使用 selenium 在 python 中获取具有特定条件的文本? (与某些兄弟姐妹的文本)
How can I get texts with certain criteria in python with selenium? (texts with certain siblings)
这对我来说真的很棘手,所以我会尽可能详细地描述这个问题。
首先,让我向您展示一些 html 的示例。
....
....
<div class="lawcon">
<p>
<span class="b1">
<label> No.1 </label>
</span>
</p>
<p>
"I Want to get 'No.1' label in span if the div[@class='lawcon'] has a certain <a> tags with "bb" title, and with a string of 'Law' in the text of it."
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Law Power</a>
</p>
</div>
<div class="lawcon">
<p>
<span class="b1">
<label> No.2 </label>
</p>
<p>
"But I don't want to get No.2 label because, although it has <a> tag with "bb" title, but it doesn't have a text of law in it"
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Just Power</a>
</p>
</div>
<div class="lawcon">
<p>
<span class="b1">
<label> No.3 </label>
</p>
<p>
"If there are multiple <a> tags with the right criteria in a single div, I want to get span(No.3) for each of those" <a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Lawyer</a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">By the Law</a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">But not this one</a>
...
...
...
所以,事情是这样的。我想在 div[@class='lawcon'] 中提取(例如 No.1)的文本,仅当 div 具有带有 "bb" 的标签时标题,里面有一串'Law'.
如果在div里面,如果没有任何带有"bb"标题的标签,或者里面没有"Law"的字符串,则不应收集span。
我试过的是
div_list = [div.text for div in driver.find_elements_by_xpath('//span[following-sibling::a[@title="bb"]]')]
但问题是,当它在单个 div 中有多个具有正确标准的标签时,它只有 return 只有一个 div.
我想要的是标签文本的位置(:跨度数字)列表(或元组)
所以应该是这样的
[[No.1 - Law Power], [No.3 - Lawyer], [No.3 - By the Law]]
我不确定我解释得够不够。感谢您的兴趣,并希望用您的知识启发我!非常感谢。
这里是简单的 python 脚本来获得你想要的输出。
links = driver.find_elements_by_xpath("//a[@title='bb' and contains(.,'Law')]")
linkData = []
for link in links:
currentList = []
currentList.append(link.find_element_by_xpath("./ancestor::div[@class='lawcon']//label").text + '-' + link.text)
linkData.append(currentList)
print(linkData)
Output:
[['No.1-Law Power'], ['No.3-Lawyer'], ['No.3-By the Law']]
我不确定您为什么想要那种格式的输出。我更喜欢下面的方法,这样你就会知道有多少 div 有匹配的链接,然后你可以根据 div 从输出中访问链接。只是一个想法。
divs = driver.find_elements_by_xpath("//a[@title='bb' and contains(.,'Law')]//ancestor::div[@class='lawcon']")
linkData = []
for div in divs:
currentList = []
for link in div.find_elements_by_xpath(".//a[@title='bb' and contains(.,'Law')]"):
currentList.append(div.find_element_by_xpath(".//label").text + '-' + link.text)
linkData.append(currentList)
print(linkData)
Output:
[['No.1-Law Power'], ['No.3-Lawyer', 'No.3-By the Law']]
由于你的要求是提取<label>
标签内的No.1等文本,你必须归纳 WebDriverWait 用于 visibility_of_all_elements_located()
并且您将只有 2 个匹配项(与您期望的 3 个匹配)并且您可以使用以下 :
使用XPATH
:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='lawcon']//a[@title='bb' and contains(.,'Law')]//preceding::label[1]")))])
这对我来说真的很棘手,所以我会尽可能详细地描述这个问题。
首先,让我向您展示一些 html 的示例。
....
....
<div class="lawcon">
<p>
<span class="b1">
<label> No.1 </label>
</span>
</p>
<p>
"I Want to get 'No.1' label in span if the div[@class='lawcon'] has a certain <a> tags with "bb" title, and with a string of 'Law' in the text of it."
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Law Power</a>
</p>
</div>
<div class="lawcon">
<p>
<span class="b1">
<label> No.2 </label>
</p>
<p>
"But I don't want to get No.2 label because, although it has <a> tag with "bb" title, but it doesn't have a text of law in it"
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Just Power</a>
</p>
</div>
<div class="lawcon">
<p>
<span class="b1">
<label> No.3 </label>
</p>
<p>
"If there are multiple <a> tags with the right criteria in a single div, I want to get span(No.3) for each of those" <a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Lawyer</a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">By the Law</a>
<a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">But not this one</a>
...
...
...
所以,事情是这样的。我想在 div[@class='lawcon'] 中提取(例如 No.1)的文本,仅当 div 具有带有 "bb" 的标签时标题,里面有一串'Law'.
如果在div里面,如果没有任何带有"bb"标题的标签,或者里面没有"Law"的字符串,则不应收集span。
我试过的是
div_list = [div.text for div in driver.find_elements_by_xpath('//span[following-sibling::a[@title="bb"]]')]
但问题是,当它在单个 div 中有多个具有正确标准的标签时,它只有 return 只有一个 div.
我想要的是标签文本的位置(:跨度数字)列表(或元组)
所以应该是这样的
[[No.1 - Law Power], [No.3 - Lawyer], [No.3 - By the Law]]
我不确定我解释得够不够。感谢您的兴趣,并希望用您的知识启发我!非常感谢。
这里是简单的 python 脚本来获得你想要的输出。
links = driver.find_elements_by_xpath("//a[@title='bb' and contains(.,'Law')]")
linkData = []
for link in links:
currentList = []
currentList.append(link.find_element_by_xpath("./ancestor::div[@class='lawcon']//label").text + '-' + link.text)
linkData.append(currentList)
print(linkData)
Output:
[['No.1-Law Power'], ['No.3-Lawyer'], ['No.3-By the Law']]
我不确定您为什么想要那种格式的输出。我更喜欢下面的方法,这样你就会知道有多少 div 有匹配的链接,然后你可以根据 div 从输出中访问链接。只是一个想法。
divs = driver.find_elements_by_xpath("//a[@title='bb' and contains(.,'Law')]//ancestor::div[@class='lawcon']")
linkData = []
for div in divs:
currentList = []
for link in div.find_elements_by_xpath(".//a[@title='bb' and contains(.,'Law')]"):
currentList.append(div.find_element_by_xpath(".//label").text + '-' + link.text)
linkData.append(currentList)
print(linkData)
Output:
[['No.1-Law Power'], ['No.3-Lawyer', 'No.3-By the Law']]
由于你的要求是提取<label>
标签内的No.1等文本,你必须归纳 WebDriverWait 用于 visibility_of_all_elements_located()
并且您将只有 2 个匹配项(与您期望的 3 个匹配)并且您可以使用以下
使用
XPATH
:print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='lawcon']//a[@title='bb' and contains(.,'Law')]//preceding::label[1]")))])