JSoup - 格式化 <option> 元素

JSoup - Formatting the <option> elements

假设我有这个 HTML :

<html>
    <head>
    </head>
    <body>
        <form method="post">
            <select name="books"> 
                <option value="111">111</option>
                <option value="222">222</option>
            </select>
        </form>
    </body>
</html>

我将其加载到 Jsoup 中并返回结果:

Document doc = Jsoup.parse(html);
doc.outputSettings().indentAmount(4);
doc.outputSettings().charset("UTF-8");
doc.outputSettings().prettyPrint(true);
String result = doc.outerHtml();

这个结果是:

<html>
    <head> 
    </head> 
    <body> 
        <form method="post"> 
            <select name="books"> <option value="111">111</option> <option value="222">222</option> </select> 
        </form>  
    </body>
</html>

<option>个元素都在同一行!

如何让 Jsoup 格式化 <option> 元素,使结果与输入相同,在此示例中?

doc.outputSettings().charset("UTF-8");

当仅从字符串解析 html 时,默认字符集为 UTF-8,除非您使用 FileInputStream 作为解析输入设置字符集。

因此,OutputSettings 上的字符集将默认与输入相同,在您的情况下为 UTF-8。如果你想让它与输入不同,你只需要设置它。

Document.OutputSettings.charset()

Get the document's current output charset, which is used to control which characters are escaped when generating HTML (via the html() methods), and which are kept intact.

Where possible (when parsing from a URL or File), the document's output charset is automatically set to the input charset. Otherwise, it defaults to UTF-8.


doc.outputSettings().prettyPrint(true);

您不需要启用漂亮打印,默认情况下它是打开的。

Document.OutputSettings.prettyPrint()

Get if pretty printing is enabled. Default is true. If disabled, the HTML output methods will not re-format the output, and the output will generally look like the input.


doc.outputSettings().outline(true);

这是关键标签。如果未设置,则仅显示块标签(optionnot a block tag)。启用后,所有标签都被视为块元素。

Document.OutputSettings.outline()

Get if outline mode is enabled. Default is false. If enabled, the HTML output methods will consider all tags as block.


因此您的最终代码块应如下所示:

Document doc = Jsoup.parse(html);

doc.outputSettings().indentAmount(4).outline(true);

String result = doc.outerHtml();

输出

<html>
    <head> 
    </head> 
    <body> 
        <form method="post"> 
            <select name="books"> 
                <option value="111">111</option> 
                <option value="222">222</option> 
            </select> 
        </form>  
    </body>
</html>