XML解析器在解析 XML 架构文件时跳过属性

Question

我需要读取 XML 架构文件并仅提取这些元素，这些元素具有字段 minOccurs="0"。但是我遇到了问题，当 XML 解析器在解析文档时跳过该字段。

就像我在下面的代码中向您展示的一样。

我有一个例子XML文件：

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
    <xsd:include schemaLocation="def.xml"/>
    <xsd:element name="MainElementName">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="A">
                    <xsd:complexType>
                        <xsd:attribute name="AA" required="False" type="string"/>
                    </xsd:complexType>
                </xsd:element>
                <xsd:element name="B" minOccurs="0" maxOccurs="unbounded">
                    <xsd:complexType>
                        <xsd:attribute name="BA" type="string"/>
                    </xsd:complexType>
                </xsd:element>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

然后我用这段代码解析它：

    with open(xsd_path, 'r'):
        try:
            parser = et.XMLParser(remove_blank_text=True)
            tree = et.parse(xsd_path, parser)
            tmp_text = et.tostring(tree, pretty_print=True, encoding=str)
        except IOError as e:
            print(e)

我得到输出：

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
    <xsd:include schemaLocation="def.xml"/>
    <xsd:element name="MainElementName">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="A">
                    <xsd:complexType>
                        <xsd:attribute name="AA" type="string"/>
                    </xsd:complexType>
                </xsd:element>
                <xsd:element name="B">
                    <xsd:complexType>
                        <xsd:attribute name="BA" type="string"/>
                    </xsd:complexType>
                </xsd:element>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

我不知道为什么解析器会跳过属性中的 required 字段和元素中的 min/maxOccurs 字段。有谁知道如何解决这个问题？

Answer 1

你的代码不正确，我无法编译它。例如，ElementTree.tostring() 接受一个 Element 实例，但您传递的是一个 ElementTree 实例 (tree).

这段代码对我有用：

import xml.etree.ElementTree as et

parser = et.XMLParser()
tree = et.parse('/path/to.xml', parser)
tmp_text = et.tostring(tree.getroot(), encoding='unicode')
print(tmp_text)

使用 Python 3.6.8 输出为：

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:include schemaLocation="def.xml" />
    <xs:element name="MainElementName">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="A">
                    <xs:complexType>
                        <xs:attribute name="AA" required="False" type="string" />
                    </xs:complexType>
                </xs:element>
                <xs:element maxOccurs="unbounded" minOccurs="0" name="B">
                    <xs:complexType>
                        <xs:attribute name="BA" type="string" />
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

XML解析器在解析 XML 架构文件时跳过属性

XMLParser skips attributes while parsing XML Schema file

python

xsd

lxml