删除给定标签后的所有元素

Question

tag结构如下：

<div class="some-class">
  <h3>Foo</h3>
  <p>...</p>
  <p>...</p>
  <h3>Bar</h3>
  <p>...</p>
  <p>...</p>
  ...

现在我想在找到 h3 标签后删除 some-class 标签内的所有元素。 JSoup 中是否有 nextAll() 方法，就像 JavaScript 中的方法一样？

现在我有

for (Element el : doc.select("div") {
  if (el.className().equalsIgnoreCase("some-class") {
    for (Element e : el.select("h3") {
      if (e.hasText().equalsIgnoreCase("Bar") {
        removeAllNextPTags();
      }
    }
  }
}

有什么想法吗？

Answer 1

Is there a nextAll() method in JSoup

您可以在元素库上使用 nextElementSibling() 或在节点库上使用 nextElementSibling()。

我不太确定我的理解是否正确，但是你想删除 h3 之后的所有元素还是只删除 p 元素（直到另一个 h3 出现！？ )?

以下是删除所有 p 元素的方法，从具有给定文本的 h3 元素开始 - 直到找到另一个 h3：

public void removeChilds(Element root, String h3Text)
{
    final Element h3Start = root.select("h3:contains(" + h3Text + ")").first();
    final int h3Idx = h3Start.siblingIndex();

    for( Element e : h3Start.siblingElements() )
    {
        // Skip all nodes before the relevant h3 element
        if( e.siblingIndex() > h3Idx )
        {
            switch(e.tagName())
            {
                case "p":
                    e.remove();
                    break;
                case "h3":
                    /* Stop if there's a h3 */
                    return;
                default:
                    /* Stop also if there's any non-p element!? */
                    return;
            }
        }
    }
}

为所有具有给定文本的 h3 执行此操作 - 例如。带有子元素的多个 <h3>Foo</h3> 元素 - 您可以用找到的元素的循环替换 first()（这就是 select() returns）。

删除给定标签后的所有元素

Remove all elements after given tag

java

parsing

html-parsing

jsoup