仅转义 XML 个实体一次

Question

我在字符串中有以下 XML 片段 （注意双重编码 &）:

...
&lt;PARA&gt;
S&amp;amp;P
&lt;/PARA&gt;
...

我想要的输出是：

> ... <PARA> S&amp;P </PARA> ...

如果我使用：

StringEscapeUtils.unescapeXml()

实际输出为：

 > ... <PARA> S&P </PARA> ...

似乎 StringEscapeUtils.unescapeXml() 将输入转义两次，或者只要它包含实体。

是否有更好的实用方法或简单的解决方案，可以对每个 xml 实体（不只是少数，而是所有强调的字符）进行一次转义，这样我的编码 & 部分就不会搞砸了？

谢谢，彼得

Answer 1

当你使用第三方库时，你应该包括库名称和版本。

StringEscapeUtils 是 Apache Commons Text 和 Apache Commons Lang（已弃用）的一部分。最新版本（截至 2017 年 11 月）是 Commons Text 1.1 和 Commons Lang 3.3.7。两个版本都显示正确的结果。

import org.apache.commons.text.StringEscapeUtils;
public class EscapeTest {
  public static void main(String[] args) {
    final String s = "&lt;PARA&gt; S&amp;amp;P &lt;/PARA&gt;";
    System.out.println(StringEscapeUtils.unescapeXml(s));
  }
}

输出：<PARA> S&P </PARA>

Answer 2

也许这是一个冗长的方法，但我不能使用 Apache Commons

public static void main(String[] args) {
        String a = "&lt;PARA&gt; S&amp;amp;P &lt;/PARA&gt;";
        String ea = unescapeXML(a);
        System.out.println(ea);
    }

    public static String unescapeXML(final String xml) {
        Pattern xmlEntityRegex = Pattern.compile("&(#?)([^;]+);");
        StringBuffer unescapedOutput = new StringBuffer(xml.length());

        Matcher m = xmlEntityRegex.matcher(xml);
        Map<String, String> builtinEntities = null;
        String entity;
        String hashmark;
        String ent;
        int code;
        while (m.find()) {
            ent = m.group(2);
            hashmark = m.group(1);
            if ((hashmark != null) && (hashmark.length() > 0)) {
                code = Integer.parseInt(ent);
                entity = Character.toString((char) code);
            } else {
                if (builtinEntities == null) {
                    builtinEntities = buildBuiltinXMLEntityMap();
                }
                entity = builtinEntities.get(ent);
                if (entity == null) {
                    entity = "&" + ent + ';';
                }
            }
            m.appendReplacement(unescapedOutput, entity);
        }
        m.appendTail(unescapedOutput);
        return unescapedOutput.toString();

    }

    private static Map<String, String> buildBuiltinXMLEntityMap() {
        Map<String, String> entities = new HashMap<>(10);
        entities.put("lt", "<");
        entities.put("gt", ">");
        entities.put("amp", "&");
        entities.put("apos", "'");
        entities.put("quot", "\"");
        return entities;
    }

输出：

<PARA> S&amp;P </PARA>

仅转义 XML 个实体一次

Escape XML entities only once

java

xml

apache

html-entities