lxml 包含相对路径
lxml include relative path
使用 Python 的 lxml 库,我正在尝试加载 .xsd 作为模式。 Python 脚本在一个目录中,模式在另一个目录中:
/root
my_script.py
/data
/xsd
schema_1.xsd
schema_2.xsd
问题是 schema_1.xsd
包括 schema_2.xsd
这样的:
<xsd:include schemaLocation="schema_2.xsd"/>
作为schema_2.xsd
相对路径(两个模式在同一个目录中),lxml找不到它并且它上升并出现错误:
schema_root = etree.fromstring(open('data/xsd/schema_1.xsd').read().encode('utf-8'))
schema = etree.XMLSchema(schema_root)
--> xml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}include': Failed to load the document './schema_2.xsd' for inclusion
如何在不更改架构文件的情况下解决此问题?
一种选择是使用 XML Catalog. You could also probably use a custom URI Resolver,但我一直使用目录。非开发人员更容易进行配置更改。如果您交付的是可执行文件而不是普通的 Python.
,这将特别有用
Windows 和 Linux 使用目录的方式不同; see here for more info。
这是一个使用 Python 3.#.
的 Windows 示例
XSD#1 (schema_1.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:include schemaLocation="schema_2.xsd"/>
<xs:element name="doc">
<xs:complexType>
<xs:sequence>
<xs:element ref="test"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="test" type="test"/>
</xs:schema>
XSD#2 (schema_2.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:simpleType name="test">
<xs:restriction base="xs:string">
<xs:enumeration value="Hello World"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
XML 目录 (catalog.xml)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<!-- The path in @uri is relative to this file (catalog.xml). -->
<system systemId="schema_2.xsd" uri="./xsd_test/schema_2.xsd"/>
</catalog>
Python
import os
from urllib.request import pathname2url
from lxml import etree
# The XML_CATALOG_FILES environment variable is used by libxml2 (which is used by lxml).
# See http://xmlsoft.org/catalog.html.
if "XML_CATALOG_FILES" not in os.environ:
# Path to catalog must be a url.
catalog_path = f"file:{pathname2url(os.path.join(os.getcwd(), 'catalog.xml'))}"
# Temporarily set the environment variable.
os.environ['XML_CATALOG_FILES'] = catalog_path
schema_root = etree.fromstring(open('xsd_test/schema_1.xsd').read().encode('utf-8'))
schema = etree.XMLSchema(schema_root)
print(schema)
打印输出
<lxml.etree.XMLSchema object at 0x02B4B3F0>
您的情况可能还有更简单的解决方案。我今天 运行 解决了这个问题,并通过在导入 xml 模式时临时更改当前工作目录解决了这个问题:
import os
from lxml import etree
xml_schema_path = 'data/xsd/schema_1.xsd'
# Get the working directory the script was run from
run_dir = os.getcwd()
# Set the working directory to the schema dir so relative imports resolve from there
os.chdir(os.path.dirname(xml_schema_path))
# Load the schema. Note that you can use the `file=` option to point to a file path
xml_schema = etree.XMLSchema(file=os.path.basename(xml_schema_path))
# Re-set the working directory
os.chdir(run_dir)
使用 Python 的 lxml 库,我正在尝试加载 .xsd 作为模式。 Python 脚本在一个目录中,模式在另一个目录中:
/root
my_script.py
/data
/xsd
schema_1.xsd
schema_2.xsd
问题是 schema_1.xsd
包括 schema_2.xsd
这样的:
<xsd:include schemaLocation="schema_2.xsd"/>
作为schema_2.xsd
相对路径(两个模式在同一个目录中),lxml找不到它并且它上升并出现错误:
schema_root = etree.fromstring(open('data/xsd/schema_1.xsd').read().encode('utf-8'))
schema = etree.XMLSchema(schema_root)
--> xml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}include': Failed to load the document './schema_2.xsd' for inclusion
如何在不更改架构文件的情况下解决此问题?
一种选择是使用 XML Catalog. You could also probably use a custom URI Resolver,但我一直使用目录。非开发人员更容易进行配置更改。如果您交付的是可执行文件而不是普通的 Python.
,这将特别有用Windows 和 Linux 使用目录的方式不同; see here for more info。
这是一个使用 Python 3.#.
的 Windows 示例XSD#1 (schema_1.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:include schemaLocation="schema_2.xsd"/>
<xs:element name="doc">
<xs:complexType>
<xs:sequence>
<xs:element ref="test"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="test" type="test"/>
</xs:schema>
XSD#2 (schema_2.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:simpleType name="test">
<xs:restriction base="xs:string">
<xs:enumeration value="Hello World"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
XML 目录 (catalog.xml)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.1//EN" "http://www.oasis-open.org/committees/entity/release/1.1/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<!-- The path in @uri is relative to this file (catalog.xml). -->
<system systemId="schema_2.xsd" uri="./xsd_test/schema_2.xsd"/>
</catalog>
Python
import os
from urllib.request import pathname2url
from lxml import etree
# The XML_CATALOG_FILES environment variable is used by libxml2 (which is used by lxml).
# See http://xmlsoft.org/catalog.html.
if "XML_CATALOG_FILES" not in os.environ:
# Path to catalog must be a url.
catalog_path = f"file:{pathname2url(os.path.join(os.getcwd(), 'catalog.xml'))}"
# Temporarily set the environment variable.
os.environ['XML_CATALOG_FILES'] = catalog_path
schema_root = etree.fromstring(open('xsd_test/schema_1.xsd').read().encode('utf-8'))
schema = etree.XMLSchema(schema_root)
print(schema)
打印输出
<lxml.etree.XMLSchema object at 0x02B4B3F0>
您的情况可能还有更简单的解决方案。我今天 运行 解决了这个问题,并通过在导入 xml 模式时临时更改当前工作目录解决了这个问题:
import os
from lxml import etree
xml_schema_path = 'data/xsd/schema_1.xsd'
# Get the working directory the script was run from
run_dir = os.getcwd()
# Set the working directory to the schema dir so relative imports resolve from there
os.chdir(os.path.dirname(xml_schema_path))
# Load the schema. Note that you can use the `file=` option to point to a file path
xml_schema = etree.XMLSchema(file=os.path.basename(xml_schema_path))
# Re-set the working directory
os.chdir(run_dir)