lxml Python 包将版权符号更改为 HTML 实体
lxml Python package changes copyright symbol to HTML entity
我有一个 Python 程序可以读取 XML 文件并修改版本属性。其中一些文件还有版权声明,版权符号为 ©
。 lxml 包将这些转换为 HTML 实体 ©
。有没有办法防止这种情况?
我尝试使用 XMLParser 函数的 resolve_entities
参数,但没有效果。我试过 Python 2.7 和 3.6.3。以下程序适用于 Python 3.
# coding: utf-8
import os
import glob
import argparse
from lxml import etree
xParser = etree.XMLParser(strip_cdata=False, resolve_entities=False)
etree.set_default_parser(xParser)
someXML ='<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>'
doc = etree.fromstring(someXML)
print(someXML)
print(etree.tostring(doc))
它打印出来:
<node version="1.0.1"><copyright>Copyright c 2017 by me</copyright></node>
b'<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>'
您可以在转储到字符串时指定 unicode
编码:
etree.tostring(doc, encoding="unicode")
演示:
In [1]: from lxml import etree
In [2]: xParser = etree.XMLParser(strip_cdata=False, resolve_entities=False)
In [3]: someXML ='<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>'
In [4]: doc = etree.fromstring(someXML, parser=xParser)
In [5]: print(someXML)
<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>
In [6]: print(etree.tostring(doc, encoding="unicode"))
<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>
我有一个 Python 程序可以读取 XML 文件并修改版本属性。其中一些文件还有版权声明,版权符号为 ©
。 lxml 包将这些转换为 HTML 实体 ©
。有没有办法防止这种情况?
我尝试使用 XMLParser 函数的 resolve_entities
参数,但没有效果。我试过 Python 2.7 和 3.6.3。以下程序适用于 Python 3.
# coding: utf-8
import os
import glob
import argparse
from lxml import etree
xParser = etree.XMLParser(strip_cdata=False, resolve_entities=False)
etree.set_default_parser(xParser)
someXML ='<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>'
doc = etree.fromstring(someXML)
print(someXML)
print(etree.tostring(doc))
它打印出来:
<node version="1.0.1"><copyright>Copyright c 2017 by me</copyright></node>
b'<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>'
您可以在转储到字符串时指定 unicode
编码:
etree.tostring(doc, encoding="unicode")
演示:
In [1]: from lxml import etree
In [2]: xParser = etree.XMLParser(strip_cdata=False, resolve_entities=False)
In [3]: someXML ='<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>'
In [4]: doc = etree.fromstring(someXML, parser=xParser)
In [5]: print(someXML)
<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>
In [6]: print(etree.tostring(doc, encoding="unicode"))
<node version="1.0.1"><copyright>Copyright © 2017 by me</copyright></node>