Jsoup select 带有多个标签的标签后的文本

Jsoup select text after tag with many tag

我想使用 jsoup 在每个文本后提取文本。有什么办法可以select吗?

示例代码如下:

<div class="content">
<div name="panel-summary" id="summary">
    <p>
    <strong>A: </strong>*thank you* **I want to retrieve this text**<br>
    <strong>B: </strong>*Bla..bla* *I don't want this text*<br>
    <strong>C: </strong>*what ever text* *I dont want this*                         
        <strong>D: </strong>*anythinh text* *I want this*<br>
        <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>
        <strong>F: </strong>*anythinh text* *I want this*<br>
    </p>

    <p>I want this</p>

完成后它会创建自动 ID 示例 id=123

如果我们可以假设您要查找的所有 <strong> 元素将始终包含 A:D:F: 然后使用 strong:matchesOwn(regex) (其中正则表达式将代表 A:|D:|F:) 我们可以 select 这些元素。

处理 strong 之后,我们可以继续第二个 <p> 并通过 text().

获取其文本内容
String html = "<div class=\"content\">\n" +
        "<div name=\"panel-summary\" id=\"summary\">\n" +
        "    <p>\n" +
        "    <strong>A: </strong>*thank you* **I want to retrieve this text**<br>\n" +
        "    <strong>B: </strong>*Bla..bla* *I don't want this text*<br>\n" +
        "    <strong>C: </strong>*what ever text* *I dont want this*                         \n" +
        "        <strong>D: </strong>*anythinh text* *I want this*<br>\n" +
        "        <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>\n" +
        "        <strong>F: </strong>*anythinh text* *I want this*<br>\n" +
        "    </p>\n" +
        "\n" +
        "    <p>I want this</p>";

Document doc = Jsoup.parse(html);
Elements pElements = doc.select("#summary p");
Elements strongElements = pElements.first().select("strong:matchesOwn(A:|D:|F:)");
for (Element strong : strongElements) {
    System.out.println(strong.nextSibling());//get next element, including textual element
}
System.out.println("---");
System.out.println(pElements.get(1).text());//textual content of <p>I want this</p>

输出:

*thank you* **I want to retrieve this text**
*anythinh text* *I want this*
*anythinh text* *I want this*
---
I want this

如果您不想依赖 <strong> 的内容,而只是依赖其索引,那么选择所有索引,例如

Elements allStrElemens = doc.select("#summary p strong");

然后简单地通过他们的索引选择你需要的(记住索引从 0 开始)就像

System.out.println(allStrElemens.get(0).nextSibling());
System.out.println(allStrElemens.get(3).nextSibling());
System.out.println(allStrElemens.get(5).nextSibling());