lxml xpath RDF 不工作
lxml xpath RDF not working
我正在尝试提取 /RDF/Description/id/text()
字符串,它应该是下面的 someid
。使用 python 的 lxml 提取它的合适的 xpath 是什么?
<?xml version="1.0" encoding="utf-8"?>
<!-- This Source Code Form is subject to the terms of the Mozilla Public
- License, v. 2.0. If a copy of the MPL was not distributed with this
- file, You can obtain one at http://mozilla.org/MPL/2.0/. -->
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:em="http://www.mozilla.org/2004/em-rdf#">
<Description about="urn:mozilla:install-manifest">
<em:id>my-extension@mozilla</em:id>
<em:version>initial</em:version>
<em:type>2</em:type>
<em:bootstrap>true</em:bootstrap>
<em:unpack>false</em:unpack>
<!-- Firefox -->
<em:targetApplication>
<Description>
<em:id>{someid}</em:id>
<em:minVersion>7.0</em:minVersion>
<em:maxVersion>27.0</em:maxVersion>
</Description>
</em:targetApplication>
<!-- Front End MetaData -->
<!-- must provide default non-localized because It's used as a default on AMO. It's used as a default by the add-on manager, with the possibility of other locales overriding it. Failure to provide a non-localized name will lead to failed upload on AMO. -->
<em:name>l10n</em:name>
<em:description>ff-addon-demo: Shows how to localize restartless add-ons.</em:description>
<em:creator>Noitidart</em:creator>
<!-- start localizing -->
<em:localized>
<Description>
<em:locale>en-GB</em:locale>
<em:name>l10n :: en-GB</em:name>
<em:description>en-GB :: ff-addon-demo: Shows how to localize restartless add-ons. </em:description>
<em:creator>en-GB :: Noitidart</em:creator>
</Description>
</em:localized>
<em:localized>
<Description>
<em:locale>en-US</em:locale>
<em:name>l10n :: en-US</em:name>
<em:description>en-US :: ff-addon-demo: Shows how to localize restartless add-ons. </em:description>
<em:creator>en-US :: Noitidart</em:creator>
</Description>
</em:localized>
</Description>
</RDF>
这些我都试过了:"*/*[4]" , "*/*[4]" , "*/*" , "@my:*" , "em:*" , "my:*" , "@*" , "//id" , "//em:id" , "//em" , "//*[text()='USA']" , "{http://www.mozilla.org/2004/em-rdf#}:localized" , "*/*" , "//tag:RDF" , "//*RDF" , "/RDF/Description/em:targetApplication" , "*/localized" , "*/*localized" , "*/*" , "*/*" , "*/*" , "*/*" , "*/*" , "*/*" , "*/http://www.mozilla.org/2004/em-rdf#" , "*/RDF" , "*/*" , "/RDF" , "//RDF" , "/RDF", ".//Description" , "//?xml" , "//about" , "//em" , "//Description" , "/RDF" , "*/*" , "*/Description" , "*/Descriptoin" , "*" , "./?xml" , "?xml" , "//?xml" , "//http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF" , "http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF" , "//version" , "//xml" , "//" , "//RDF" , "./version" , "version" , "xml" , "/RDF/Description/*" , "/RDF/Description"
,白白浪费了很多时间。
编辑:在下面的解决方案之后,我找到了这个关于这个常见问题的很好的参考文档
https://msdn.microsoft.com/en-us/library/ms950779.aspx
这是一种可能的方式;查看 XPath 如何对应于 XML 结构,以及如何使用 XPath 引用命名空间中的 XML 元素:
from lxml import etree
xml = """your xml as posted in question here"""
root = etree.fromstring(xml)
nsmap = {'d': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'em': 'http://www.mozilla.org/2004/em-rdf#'}
result = root.xpath("/d:RDF/d:Description/em:targetApplication/d:Description/em:id/text()",
namespaces=nsmap)
print(result)
输出:
['{someid}']
我正在尝试提取 /RDF/Description/id/text()
字符串,它应该是下面的 someid
。使用 python 的 lxml 提取它的合适的 xpath 是什么?
<?xml version="1.0" encoding="utf-8"?>
<!-- This Source Code Form is subject to the terms of the Mozilla Public
- License, v. 2.0. If a copy of the MPL was not distributed with this
- file, You can obtain one at http://mozilla.org/MPL/2.0/. -->
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:em="http://www.mozilla.org/2004/em-rdf#">
<Description about="urn:mozilla:install-manifest">
<em:id>my-extension@mozilla</em:id>
<em:version>initial</em:version>
<em:type>2</em:type>
<em:bootstrap>true</em:bootstrap>
<em:unpack>false</em:unpack>
<!-- Firefox -->
<em:targetApplication>
<Description>
<em:id>{someid}</em:id>
<em:minVersion>7.0</em:minVersion>
<em:maxVersion>27.0</em:maxVersion>
</Description>
</em:targetApplication>
<!-- Front End MetaData -->
<!-- must provide default non-localized because It's used as a default on AMO. It's used as a default by the add-on manager, with the possibility of other locales overriding it. Failure to provide a non-localized name will lead to failed upload on AMO. -->
<em:name>l10n</em:name>
<em:description>ff-addon-demo: Shows how to localize restartless add-ons.</em:description>
<em:creator>Noitidart</em:creator>
<!-- start localizing -->
<em:localized>
<Description>
<em:locale>en-GB</em:locale>
<em:name>l10n :: en-GB</em:name>
<em:description>en-GB :: ff-addon-demo: Shows how to localize restartless add-ons. </em:description>
<em:creator>en-GB :: Noitidart</em:creator>
</Description>
</em:localized>
<em:localized>
<Description>
<em:locale>en-US</em:locale>
<em:name>l10n :: en-US</em:name>
<em:description>en-US :: ff-addon-demo: Shows how to localize restartless add-ons. </em:description>
<em:creator>en-US :: Noitidart</em:creator>
</Description>
</em:localized>
</Description>
</RDF>
这些我都试过了:"*/*[4]" , "*/*[4]" , "*/*" , "@my:*" , "em:*" , "my:*" , "@*" , "//id" , "//em:id" , "//em" , "//*[text()='USA']" , "{http://www.mozilla.org/2004/em-rdf#}:localized" , "*/*" , "//tag:RDF" , "//*RDF" , "/RDF/Description/em:targetApplication" , "*/localized" , "*/*localized" , "*/*" , "*/*" , "*/*" , "*/*" , "*/*" , "*/*" , "*/http://www.mozilla.org/2004/em-rdf#" , "*/RDF" , "*/*" , "/RDF" , "//RDF" , "/RDF", ".//Description" , "//?xml" , "//about" , "//em" , "//Description" , "/RDF" , "*/*" , "*/Description" , "*/Descriptoin" , "*" , "./?xml" , "?xml" , "//?xml" , "//http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF" , "http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF" , "//version" , "//xml" , "//" , "//RDF" , "./version" , "version" , "xml" , "/RDF/Description/*" , "/RDF/Description"
,白白浪费了很多时间。
编辑:在下面的解决方案之后,我找到了这个关于这个常见问题的很好的参考文档 https://msdn.microsoft.com/en-us/library/ms950779.aspx
这是一种可能的方式;查看 XPath 如何对应于 XML 结构,以及如何使用 XPath 引用命名空间中的 XML 元素:
from lxml import etree
xml = """your xml as posted in question here"""
root = etree.fromstring(xml)
nsmap = {'d': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'em': 'http://www.mozilla.org/2004/em-rdf#'}
result = root.xpath("/d:RDF/d:Description/em:targetApplication/d:Description/em:id/text()",
namespaces=nsmap)
print(result)
输出:
['{someid}']