Python: 在 lxml.etree 路径中指定命名空间
Python: specifying the namespace in an lxml.etree path
我正在尝试了解如何通过 SVG 文件中的 ID 访问特定元素。我正在使用 lxml 的 python 库来解析文件,但它总是空的。这是我用来访问元素的 python 脚本:
#!/usr/bin/env python
from lxml import etree
XHTML_NAMESPACE = "http://www.w3.org/2000/svg"
XHTML = "{%s}" % XHTML_NAMESPACE
NSMAP = {None : XHTML_NAMESPACE}
root = etree.parse("temp.svg")
textid = "text1274"
path = ".//text[@id='" + textid + "']/title"
name = root.findtext(path=path, namespaces=NSMAP)
print name
结果始终为空字符串 ('None'),但没有错误。它相信它找到了我要找的东西,但我想要的是元素文本(应该是 "Wei, 771 - 661BCE.")。这是有罪的 SVG 文件:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
version="1.1"
xml:space="preserve"
viewBox="0 0 54001 32400"
id="svg2"
inkscape:version="0.91 r"
sodipodi:docname="china700BC.svg"><sodipodi:namedview
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1"
objecttolerance="10"
gridtolerance="10"
guidetolerance="10"
inkscape:pageopacity="0"
inkscape:pageshadow="2"
inkscape:window-width="1366"
inkscape:window-height="692"
id="namedview2468"
showgrid="false"
inkscape:zoom="0.016419753"
inkscape:cx="17689.896"
inkscape:cy="17739.986"
inkscape:window-x="0"
inkscape:window-y="24"
inkscape:window-maximized="1"
inkscape:current-layer="svg2" />
<defs
id="defs4">
<filter
id="blur2">
<feGaussianBlur
id="feGaussianBlur7"
result="blur"
stdDeviation="2"
in="SourceGraphic" />
</filter>
<filter
id="blur4">
<feGaussianBlur
id="feGaussianBlur10"
result="blur"
stdDeviation="4"
in="SourceGraphic" />
</filter>
<filter
id="blur8">
<feGaussianBlur
id="feGaussianBlur13"
result="blur"
stdDeviation="8"
in="SourceGraphic" />
</filter>
<filter
id="blur16">
<feGaussianBlur
id="feGaussianBlur16"
result="blur"
stdDeviation="16"
in="SourceGraphic" />
</filter>
<filter
id="blur32">
<feGaussianBlur
id="feGaussianBlur19"
result="blur"
stdDeviation="32"
in="SourceGraphic" />
</filter>
<filter
id="blur64">
<feGaussianBlur
id="feGaussianBlur22"
result="blur"
stdDeviation="64"
in="SourceGraphic" />
</filter>
</defs>
>
<g
stroke-linecap="round"
stroke-linejoin="round"
stroke-miterlimit="7"
stroke-width="14"
fill="none"
filter="url(#blur2)"
id="fntr">
<ellipse
id="ellipse381"
fill="white"
stroke="white"
ry="1"
rx="1"
cy="0"
cx="0" />
<ellipse
id="ellipse383"
fill="white"
stroke="white"
ry="1"
rx="1"
cy="32400"
cx="54001" />
<ellipse
fill="#FEBADE"
ry="1"
rx="1"
cy="24759"
cx="48948"
id="295286-dummy" />
</g>
<g
text-anchor="middle"
id="regn">
</g>
<g
text-anchor="middle"
id="cultr">
</g>
<g
text-anchor="middle"
id="peopl">
</g>
<g
font-style="italic"
text-anchor="middle"
id="tribe">
</g>
<text
id="text455"
x="30542.088"
y="16248.173"
font-size="20"
style="font-weight:normal;font-size:233.01080322px;text-anchor:middle"><title
id="title457">Chen.</title>Chen</text>
<text
id="text1274"
x="28689.652"
y="12753.011"
font-size="28"
style="font-weight:normal;font-size:326.21511841px;text-anchor:middle"><title
id="title1276">Wei, 771 - 661BCE.</title>Wei</text>
<script
id="script2466">
function LoadHandler(event)
{
new Title(event.getTarget().getOwnerDocument(), 810);
}
</script>
</svg>
我发现我可以通过删除第八行来消除错误,从 "xmlns=..." 开始(这是命名空间声明)。但是,由于我获取此文件的位置的性质,我无法永久删除此行(并且可能不应该)。有什么方法(例如正确指定命名空间)我可以获得预期的输出而根本不必编辑 XML?
非常感谢
将默认命名空间映射到 None
前缀对我也不起作用。但是,您可以将其映射到普通字符串前缀并在 xpath 中使用该前缀,其余代码无需任何更改即可工作:
from lxml import etree
XHTML_NAMESPACE = "http://www.w3.org/2000/svg"
XHTML = "{%s}" % XHTML_NAMESPACE
NSMAP = {'d' : XHTML_NAMESPACE} # map default namespace to prefix 'd:'
root = etree.parse("temp.svg")
textid = "text1274"
path = ".//d:text[@id='" + textid + "']/d:title" # use registered prefix in xpath
name = root.findtext(path=path, namespaces=NSMAP)
print name
我正在尝试了解如何通过 SVG 文件中的 ID 访问特定元素。我正在使用 lxml 的 python 库来解析文件,但它总是空的。这是我用来访问元素的 python 脚本:
#!/usr/bin/env python
from lxml import etree
XHTML_NAMESPACE = "http://www.w3.org/2000/svg"
XHTML = "{%s}" % XHTML_NAMESPACE
NSMAP = {None : XHTML_NAMESPACE}
root = etree.parse("temp.svg")
textid = "text1274"
path = ".//text[@id='" + textid + "']/title"
name = root.findtext(path=path, namespaces=NSMAP)
print name
结果始终为空字符串 ('None'),但没有错误。它相信它找到了我要找的东西,但我想要的是元素文本(应该是 "Wei, 771 - 661BCE.")。这是有罪的 SVG 文件:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
version="1.1"
xml:space="preserve"
viewBox="0 0 54001 32400"
id="svg2"
inkscape:version="0.91 r"
sodipodi:docname="china700BC.svg"><sodipodi:namedview
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1"
objecttolerance="10"
gridtolerance="10"
guidetolerance="10"
inkscape:pageopacity="0"
inkscape:pageshadow="2"
inkscape:window-width="1366"
inkscape:window-height="692"
id="namedview2468"
showgrid="false"
inkscape:zoom="0.016419753"
inkscape:cx="17689.896"
inkscape:cy="17739.986"
inkscape:window-x="0"
inkscape:window-y="24"
inkscape:window-maximized="1"
inkscape:current-layer="svg2" />
<defs
id="defs4">
<filter
id="blur2">
<feGaussianBlur
id="feGaussianBlur7"
result="blur"
stdDeviation="2"
in="SourceGraphic" />
</filter>
<filter
id="blur4">
<feGaussianBlur
id="feGaussianBlur10"
result="blur"
stdDeviation="4"
in="SourceGraphic" />
</filter>
<filter
id="blur8">
<feGaussianBlur
id="feGaussianBlur13"
result="blur"
stdDeviation="8"
in="SourceGraphic" />
</filter>
<filter
id="blur16">
<feGaussianBlur
id="feGaussianBlur16"
result="blur"
stdDeviation="16"
in="SourceGraphic" />
</filter>
<filter
id="blur32">
<feGaussianBlur
id="feGaussianBlur19"
result="blur"
stdDeviation="32"
in="SourceGraphic" />
</filter>
<filter
id="blur64">
<feGaussianBlur
id="feGaussianBlur22"
result="blur"
stdDeviation="64"
in="SourceGraphic" />
</filter>
</defs>
>
<g
stroke-linecap="round"
stroke-linejoin="round"
stroke-miterlimit="7"
stroke-width="14"
fill="none"
filter="url(#blur2)"
id="fntr">
<ellipse
id="ellipse381"
fill="white"
stroke="white"
ry="1"
rx="1"
cy="0"
cx="0" />
<ellipse
id="ellipse383"
fill="white"
stroke="white"
ry="1"
rx="1"
cy="32400"
cx="54001" />
<ellipse
fill="#FEBADE"
ry="1"
rx="1"
cy="24759"
cx="48948"
id="295286-dummy" />
</g>
<g
text-anchor="middle"
id="regn">
</g>
<g
text-anchor="middle"
id="cultr">
</g>
<g
text-anchor="middle"
id="peopl">
</g>
<g
font-style="italic"
text-anchor="middle"
id="tribe">
</g>
<text
id="text455"
x="30542.088"
y="16248.173"
font-size="20"
style="font-weight:normal;font-size:233.01080322px;text-anchor:middle"><title
id="title457">Chen.</title>Chen</text>
<text
id="text1274"
x="28689.652"
y="12753.011"
font-size="28"
style="font-weight:normal;font-size:326.21511841px;text-anchor:middle"><title
id="title1276">Wei, 771 - 661BCE.</title>Wei</text>
<script
id="script2466">
function LoadHandler(event)
{
new Title(event.getTarget().getOwnerDocument(), 810);
}
</script>
</svg>
我发现我可以通过删除第八行来消除错误,从 "xmlns=..." 开始(这是命名空间声明)。但是,由于我获取此文件的位置的性质,我无法永久删除此行(并且可能不应该)。有什么方法(例如正确指定命名空间)我可以获得预期的输出而根本不必编辑 XML?
非常感谢
将默认命名空间映射到 None
前缀对我也不起作用。但是,您可以将其映射到普通字符串前缀并在 xpath 中使用该前缀,其余代码无需任何更改即可工作:
from lxml import etree
XHTML_NAMESPACE = "http://www.w3.org/2000/svg"
XHTML = "{%s}" % XHTML_NAMESPACE
NSMAP = {'d' : XHTML_NAMESPACE} # map default namespace to prefix 'd:'
root = etree.parse("temp.svg")
textid = "text1274"
path = ".//d:text[@id='" + textid + "']/d:title" # use registered prefix in xpath
name = root.findtext(path=path, namespaces=NSMAP)
print name