如何用字符串替换 lxml 中的元素
How do I replace an element in lxml with a string
我试图在 lxml 和 python 中找出如何用字符串替换元素。
在我的实验中,我有以下代码:
from lxml import etree as et
docstring = '<p>The value is permitted only when that includes <xref linkend=\"my linkend\" browsertext=\"something here\" filename=\"A_link.fm\"/>, otherwise the value is reserved.</p>'
topicroot = et.XML(docstring)
topicroot2 = et.ElementTree(topicroot)
xref = topicroot2.xpath('//*/xref')
xref_attribute = xref[0].attrib['browsertext']
print href_attribute
结果是:'something here'
这是我要在这个小样本中寻找的浏览器文本属性。但我似乎无法弄清楚的是如何用我在此处捕获的属性文本替换整个元素。
(我确实认识到在我的示例中我可以有多个外部参照并且需要构建一个循环以正确地遍历它们。)
执行此操作的最佳方法是什么?
对于那些想知道的人,我必须这样做,因为 link 实际上会转到一个文件,由于我们不同的构建系统,该文件不存在。
提前致谢!
试试这个 (Python 3):
from lxml import etree as et
docstring = '<p>The value is permitted only when that includes <xref linkend=\"my linkend\" browsertext=\"something here\" filename=\"A_link.fm\"/>, otherwise the value is reserved.</p>'
# Get the root element.
topicroot = et.XML(docstring)
topicroot2 = et.ElementTree(topicroot)
# Get the text of the root element. This is a list of strings!
topicroot2_text = topicroot2.xpath("text()")
# Get the xref elment.
xref = topicroot2.xpath('//*/xref')[0]
xref_attribute = xref.attrib['browsertext']
# Save a reference to the p element, remove the xref from it.
parent = xref.getparent()
parent.remove(xref)
# Set the text of the p element by combining the list of string with the
# extracted attribute value.
new_text = [topicroot2_text[0], xref_attribute, topicroot2_text[1]]
parent.text = "".join(new_text)
print(et.tostring(topicroot2))
输出:
b'<p>The value is permitted only when that includes something here, otherwise the value is reserved.</p>'
我试图在 lxml 和 python 中找出如何用字符串替换元素。
在我的实验中,我有以下代码:
from lxml import etree as et
docstring = '<p>The value is permitted only when that includes <xref linkend=\"my linkend\" browsertext=\"something here\" filename=\"A_link.fm\"/>, otherwise the value is reserved.</p>'
topicroot = et.XML(docstring)
topicroot2 = et.ElementTree(topicroot)
xref = topicroot2.xpath('//*/xref')
xref_attribute = xref[0].attrib['browsertext']
print href_attribute
结果是:'something here'
这是我要在这个小样本中寻找的浏览器文本属性。但我似乎无法弄清楚的是如何用我在此处捕获的属性文本替换整个元素。
(我确实认识到在我的示例中我可以有多个外部参照并且需要构建一个循环以正确地遍历它们。)
执行此操作的最佳方法是什么?
对于那些想知道的人,我必须这样做,因为 link 实际上会转到一个文件,由于我们不同的构建系统,该文件不存在。
提前致谢!
试试这个 (Python 3):
from lxml import etree as et
docstring = '<p>The value is permitted only when that includes <xref linkend=\"my linkend\" browsertext=\"something here\" filename=\"A_link.fm\"/>, otherwise the value is reserved.</p>'
# Get the root element.
topicroot = et.XML(docstring)
topicroot2 = et.ElementTree(topicroot)
# Get the text of the root element. This is a list of strings!
topicroot2_text = topicroot2.xpath("text()")
# Get the xref elment.
xref = topicroot2.xpath('//*/xref')[0]
xref_attribute = xref.attrib['browsertext']
# Save a reference to the p element, remove the xref from it.
parent = xref.getparent()
parent.remove(xref)
# Set the text of the p element by combining the list of string with the
# extracted attribute value.
new_text = [topicroot2_text[0], xref_attribute, topicroot2_text[1]]
parent.text = "".join(new_text)
print(et.tostring(topicroot2))
输出:
b'<p>The value is permitted only when that includes something here, otherwise the value is reserved.</p>'