Jsoup.clean() 保持未关闭状态并打开标签

Jsoup.clean() leaves unclosed and opens tags

以下代码替换此文本:<br /><br>:

String removeDisallowedTags(String textToEscape) {
    Whitelist whitelist = Whitelist.none();
    whitelist.addTags(new String[] { "b", "br", "font" });

    String safe = Jsoup.clean(textToEscape, whitelist);
    return safe;
}

为什么?

Jsoup.clean()默认将文档处理为HTML,在HTML中<br>允许没有结束标签。 <img>.

也是如此

您必须将代码解析为 XML。这将使标签关闭 - 它甚至会为您关闭它们。带有一些额外设置的固定方法:

String cleanXmlAndRemoveUnwantedTags(String textToEscape) {
    Whitelist whitelist = Whitelist.none();
    whitelist.addTags(allowedTags);

    OutputSettings outputSettings = new OutputSettings()
                    .syntax(OutputSettings.Syntax.xml)
                    .charset(StandardCharsets.UTF_8)
                    .prettyPrint(false);

    String safe = Jsoup.clean(textToEscape, "", whitelist, outputSettings);
    return safe;
}