如何使用 selenium 在 python 中获取具有特定条件的文本? (与某些兄弟姐妹的文本)

How can I get texts with certain criteria in python with selenium? (texts with certain siblings)

这对我来说真的很棘手,所以我会尽可能详细地描述这个问题。

首先,让我向您展示一些 html 的示例。

....
....

<div class="lawcon">
    <p>
        <span class="b1">
            <label> No.1 </label>
        </span>
    </p>

    <p>
    "I Want to get 'No.1' label in span if the div[@class='lawcon'] has a certain <a> tags with "bb" title, and with a string of 'Law' in the text of it."
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Law Power</a>
    </p>
</div>

<div class="lawcon">
    <p>
        <span class="b1">
            <label> No.2 </label>
    </p>

    <p>
    "But I don't want to get No.2 label because, although it has <a> tag with "bb" title, but it doesn't have a text of law in it"
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Just Power</a>

    </p>

</div>

<div class="lawcon">
    <p>
        <span class="b1">
            <label> No.3 </label>
    </p>

    <p>
    "If there are multiple <a> tags with the right criteria in a single div, I want to get span(No.3) for each of those" <a>
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">Lawyer</a>
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">By the Law</a>
        <a title="bb" class="link" onclick="javascript:blabla('12345')" href="javascript:;">But not this one</a>

...
...
...

所以,事情是这样的。我想在 div[@class='lawcon'] 中提取(例如 No.1)的文本,仅当 div 具有带有 "bb" 的标签时标题,里面有一串'Law'.

如果在div里面,如果没有任何带有"bb"标题的标签,或者里面没有"Law"的字符串,则不应收集span。

我试过的是

div_list = [div.text for div in driver.find_elements_by_xpath('//span[following-sibling::a[@title="bb"]]')]

但问题是,当它在单个 div 中有多个具有正确标准的标签时,它只有 return 只有一个 div.

我想要的是标签文本的位置(:跨度数字)列表(或元组)

所以应该是这样的

[[No.1 - Law Power], [No.3 - Lawyer], [No.3 - By the Law]]

我不确定我解释得够不够。感谢您的兴趣,并希望用您的知识启发我!非常感谢。

这里是简单的 python 脚本来获得你想要的输出。

links = driver.find_elements_by_xpath("//a[@title='bb' and contains(.,'Law')]")
linkData = []
for link in links:
    currentList = []
    currentList.append(link.find_element_by_xpath("./ancestor::div[@class='lawcon']//label").text + '-' + link.text)
    linkData.append(currentList)
print(linkData)

Output:

[['No.1-Law Power'], ['No.3-Lawyer'], ['No.3-By the Law']]

我不确定您为什么想要那种格式的输出。我更喜欢下面的方法,这样你就会知道有多少 div 有匹配的链接,然后你可以根据 div 从输出中访问链接。只是一个想法。

divs = driver.find_elements_by_xpath("//a[@title='bb' and contains(.,'Law')]//ancestor::div[@class='lawcon']")
linkData = []
for div in divs:
    currentList = []
    for link in div.find_elements_by_xpath(".//a[@title='bb' and contains(.,'Law')]"):
        currentList.append(div.find_element_by_xpath(".//label").text + '-' + link.text)
    linkData.append(currentList)
print(linkData)

Output:

[['No.1-Law Power'], ['No.3-Lawyer', 'No.3-By the Law']]

由于你的要求是提取<label>标签内的No.1等文本,你必须归纳 WebDriverWait 用于 visibility_of_all_elements_located() 并且您将只有 2 个匹配项(与您期望的 3 个匹配)并且您可以使用以下 :

  • 使用XPATH:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='lawcon']//a[@title='bb' and contains(.,'Law')]//preceding::label[1]")))])