如何使用 Nokogiri 使用本地 dtd 文件正确验证 xml 文件?
How does one properly validate an xml file with a local dtd file using Nokogiri?
我有一个简单有效的 DTD 和一个似乎符合 DTD 的有效 XML 文件,但 Nokogiri 生成了大量验证输出,这意味着 XML 文件未通过验证。
dtd 文件是:
<!ELEMENT protocol (copyright?, description?, interface+)>
<!ATTLIST protocol name CDATA #REQUIRED>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT interface (description?,(request|event|enum)+)>
<!ATTLIST interface name CDATA #REQUIRED>
<!ATTLIST interface version CDATA #REQUIRED>
<!ELEMENT request (description?,arg*)>
<!ATTLIST request name CDATA #REQUIRED>
<!ATTLIST request type CDATA #IMPLIED>
<!ATTLIST request since CDATA #IMPLIED>
<!ELEMENT event (description?,arg*)>
<!ATTLIST event name CDATA #REQUIRED>
<!ATTLIST event since CDATA #IMPLIED>
<!ELEMENT enum (description?,entry*)>
<!ATTLIST enum name CDATA #REQUIRED>
<!ATTLIST enum since CDATA #IMPLIED>
<!ATTLIST enum bitfield CDATA #IMPLIED>
<!ELEMENT entry (description?)>
<!ATTLIST entry name CDATA #REQUIRED>
<!ATTLIST entry value CDATA #REQUIRED>
<!ATTLIST entry summary CDATA #IMPLIED>
<!ATTLIST entry since CDATA #IMPLIED>
<!ELEMENT arg (description?)>
<!ATTLIST arg name CDATA #REQUIRED>
<!ATTLIST arg type CDATA #REQUIRED>
<!ATTLIST arg summary CDATA #IMPLIED>
<!ATTLIST arg interface CDATA #IMPLIED>
<!ATTLIST arg allow-null CDATA #IMPLIED>
<!ATTLIST arg enum CDATA #IMPLIED>
<!ELEMENT description (#PCDATA)>
<!ATTLIST description summary CDATA #REQUIRED>
xml 文件是:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE protocol SYSTEM "wayland.dtd">
<protocol name="wayland">
<copyright>
FOO
SOFTWARE.
</copyright>
<interface name="wl_display" version="1">
<description summary="core global object">
The core global object. This is a special singleton object. It
is used for internal Wayland protocol features.
</description>
<request name="sync">
<description summary="asynchronous roundtrip">
The sync request asks the server to emit the 'done' event
on the returned wl_callback object. Since requests are
handled in-order and events are delivered in-order, this can
be used as a barrier to ensure all previous requests and the
resulting events have been handled.
The object returned by this request will be destroyed by the
compositor after the callback is fired and as such the client must not
attempt to use it after that point.
The callback_data passed in the callback is the event serial.
</description>
<arg name="callback" type="new_id" interface="wl_callback"/>
</request>
</interface>
</protocol>
我的简单Ruby程序是:
require 'nokogiri'
DTD_PATH = "wayland.dtd"
XML_PATH = "wayland.xml"
dtd_doc = Nokogiri::XML::Document.parse(open(DTD_PATH))
dtd = Nokogiri::XML::DTD.new('protocol', dtd_doc)
doc = Nokogiri::XML(open(XML_PATH))
puts dtd.validate(doc)
程序打印验证数组的内容,该数组不为空。示例输出:
No declaration for attribute name of element request
No declaration for element description
No declaration for attribute summary of element description
即使在 xml 文件中添加了 DOCTYPE
声明之后,a la:
<!DOCTYPE protocol SYSTEM "wayland.dtd">
并将 DTD 包装为:
<!DOCTYPE protocol [
...
]>
我仍然观察到相同的失败验证输出。我做错了什么?
您可以通过指定 ParseOptions 来进行验证。您需要使用文档类型声明指定文档类型 <!DOCTYPE protocol SYSTEM "wayland.dtd">
require 'nokogiri'
DTD_PATH = "wayland.dtd"
XML_PATH = "wayland.xml"
xml = File.read(XML_PATH)
options = Nokogiri::XML::ParseOptions::DTDVALID
doc = Nokogiri::XML::Document.parse(xml, nil, nil, options)
puts doc.external_subset.validate(doc)
我有一个简单有效的 DTD 和一个似乎符合 DTD 的有效 XML 文件,但 Nokogiri 生成了大量验证输出,这意味着 XML 文件未通过验证。
dtd 文件是:
<!ELEMENT protocol (copyright?, description?, interface+)>
<!ATTLIST protocol name CDATA #REQUIRED>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT interface (description?,(request|event|enum)+)>
<!ATTLIST interface name CDATA #REQUIRED>
<!ATTLIST interface version CDATA #REQUIRED>
<!ELEMENT request (description?,arg*)>
<!ATTLIST request name CDATA #REQUIRED>
<!ATTLIST request type CDATA #IMPLIED>
<!ATTLIST request since CDATA #IMPLIED>
<!ELEMENT event (description?,arg*)>
<!ATTLIST event name CDATA #REQUIRED>
<!ATTLIST event since CDATA #IMPLIED>
<!ELEMENT enum (description?,entry*)>
<!ATTLIST enum name CDATA #REQUIRED>
<!ATTLIST enum since CDATA #IMPLIED>
<!ATTLIST enum bitfield CDATA #IMPLIED>
<!ELEMENT entry (description?)>
<!ATTLIST entry name CDATA #REQUIRED>
<!ATTLIST entry value CDATA #REQUIRED>
<!ATTLIST entry summary CDATA #IMPLIED>
<!ATTLIST entry since CDATA #IMPLIED>
<!ELEMENT arg (description?)>
<!ATTLIST arg name CDATA #REQUIRED>
<!ATTLIST arg type CDATA #REQUIRED>
<!ATTLIST arg summary CDATA #IMPLIED>
<!ATTLIST arg interface CDATA #IMPLIED>
<!ATTLIST arg allow-null CDATA #IMPLIED>
<!ATTLIST arg enum CDATA #IMPLIED>
<!ELEMENT description (#PCDATA)>
<!ATTLIST description summary CDATA #REQUIRED>
xml 文件是:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE protocol SYSTEM "wayland.dtd">
<protocol name="wayland">
<copyright>
FOO
SOFTWARE.
</copyright>
<interface name="wl_display" version="1">
<description summary="core global object">
The core global object. This is a special singleton object. It
is used for internal Wayland protocol features.
</description>
<request name="sync">
<description summary="asynchronous roundtrip">
The sync request asks the server to emit the 'done' event
on the returned wl_callback object. Since requests are
handled in-order and events are delivered in-order, this can
be used as a barrier to ensure all previous requests and the
resulting events have been handled.
The object returned by this request will be destroyed by the
compositor after the callback is fired and as such the client must not
attempt to use it after that point.
The callback_data passed in the callback is the event serial.
</description>
<arg name="callback" type="new_id" interface="wl_callback"/>
</request>
</interface>
</protocol>
我的简单Ruby程序是:
require 'nokogiri'
DTD_PATH = "wayland.dtd"
XML_PATH = "wayland.xml"
dtd_doc = Nokogiri::XML::Document.parse(open(DTD_PATH))
dtd = Nokogiri::XML::DTD.new('protocol', dtd_doc)
doc = Nokogiri::XML(open(XML_PATH))
puts dtd.validate(doc)
程序打印验证数组的内容,该数组不为空。示例输出:
No declaration for attribute name of element request
No declaration for element description
No declaration for attribute summary of element description
即使在 xml 文件中添加了 DOCTYPE
声明之后,a la:
<!DOCTYPE protocol SYSTEM "wayland.dtd">
并将 DTD 包装为:
<!DOCTYPE protocol [
...
]>
我仍然观察到相同的失败验证输出。我做错了什么?
您可以通过指定 ParseOptions 来进行验证。您需要使用文档类型声明指定文档类型 <!DOCTYPE protocol SYSTEM "wayland.dtd">
require 'nokogiri'
DTD_PATH = "wayland.dtd"
XML_PATH = "wayland.xml"
xml = File.read(XML_PATH)
options = Nokogiri::XML::ParseOptions::DTDVALID
doc = Nokogiri::XML::Document.parse(xml, nil, nil, options)
puts doc.external_subset.validate(doc)