如何使用 ElementTree 在具有命名空间的 XML 文件中查找和编辑标签

how to find and edit tags in XML files with namespaces using ElementTree

我想在我的 XML 文档中找到特定的标签并编辑它们的文本或属性。我的 XML 文件包含命名空间(而且我理解正确,是嵌套的命名空间)。为此,我想使用的工具是 ElementTree。我设法通过 iterparse 读取了 XML 文件,但是我不知道如何保存已编辑的 XML,因为 iterparse 没有 write 元素.我需要一个解决方案来通过 parse 读取 XML 文件并剥离其名称空间和嵌套名称空间 一种保存 iterparsed 文件的方法。

对于这种情况,让我们编辑 "Rating" 标签文本。

it = ET.iterparse(adiPath)
    for _, el in it:
        if '}' in el.tag:
            el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
        for at in list(el.attrib): # strip namespaces of attributes too
            if '}' in at:
                newat = at.split('}', 1)[1]
                el.attrib[newat] = el.attrib[at]
                del el.attrib[at]
    root = it.root

    # Search Rating tag and edit it's value
    for rating in root.iter('Rating'):
        print(rating.text) # Prints 18
        rating.text = "999"
        print(rating.text) # Prints 999

但是在这种情况下 XML 文件保持不变。

这里是 XML 文件:

<?xml version="1.0" encoding="utf-8"?>
<ADI3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:content="urn:cablelabs:md:xsd:content:3.0" xmlns:core="urn:cablelabs:md:xsd:core:3.0" xmlns:offer="urn:cablelabs:md:xsd:offer:3.0" xmlns:terms="urn:cablelabs:md:xsd:terms:3.0" xmlns:title="urn:cablelabs:md:xsd:title:3.0" xmlns:adb="urn:adb:md:xsd:adb:01" xmlns:schemaLocation="urn:adb:md:xsd:adb:01 ADB-EXT-C01.xsd urn:cablelabs:md:xsd:core:3.0 MD-SP-CORE-C01.xsd urn:cablelabs:md:xsd:content:3.0 MD-SP-CONTENT-C01.xsd urn:cablelabs:md:xsd:offer:3.0 MD-SP-OFFER-C01.xsd urn:cablelabs:md:xsd:terms:3.0 MD-SP-TERMS-C01.xsd urn:cablelabs:md:xsd:title:3.0 MD-SP-TITLE-C01.xsd" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="urn:cablelabs:md:xsd:core:3.0">
  <Asset xsi:type="title:TitleType" uriId="ab://cc.com" providerVersionNum="1" internalVersionNum="0" creationDateTime="2020-01-28T08:55:19Z" startDateTime="2019-05-20T00:00:00Z" endDateTime="2028-08-20T23:59:00Z">
    <AlternateId identifierSystem="VOD1.1">ab://cc.com</AlternateId>
    <Ext>
        <adb:ExtensionType>
            <adb:TitleExt>
                <adb:SeriesInfo episodeNumber="6">
                    <adb:series seriesId="GOT" seasonCount="8"></adb:series>
                    <adb:season seasonId="GOTS08" number="8" episodeCount="6"></adb:season>
                </adb:SeriesInfo>
            </adb:TitleExt>
        </adb:ExtensionType>
    </Ext>
    <title:LocalizableTitle xml:lang="pol">
      <title:TitleLong>Game of Thrones VIII</title:TitleLong>
      <title:SummaryLong>Long summary, long summary, long summary...</title:SummaryLong>
      <title:Actor fullName="Peter Dinklage" firstName="Peter" lastName="Dinklage" />
      <title:Actor fullName="Nikolaj Coster-Waldau" firstName="Nikolaj" lastName="Coster-Waldau" />
      <title:Actor fullName="Emilia Clarke" firstName="Emilia" lastName="Clarke" />
      <title:Actor fullName="Lena Headey" firstName="Lena" lastName="Headey" />
      <title:Director fullName="David Nutter" firstName="David" lastname="Nutter" />
    </title:LocalizableTitle>
    <title:Rating ratingSystem="PL">18</title:Rating>
    <title:Audience>General</title:Audience>
    <title:DisplayRunTime>01:15</title:DisplayRunTime>
    <title:Year>2019</title:Year>
    <title:CountryOfOrigin>US</title:CountryOfOrigin>
    <title:Genre>Film fantasy</title:Genre>
    <title:ShowType>Movie</title:ShowType>
  </Asset>
  <Asset xsi:type="offer:CategoryType" uriId="cc.com/XX">
    <AlternateId identifierSystem="VOD1.1">cc.com/XX</AlternateId>
    <offer:CategoryPath>VOD/GOT/Season 8</offer:CategoryPath>
  </Asset>
  <Asset xsi:type="content:MovieType" uriId="GraoTronVIII_0_1080mp4">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIII_0_1080mp4</AlternateId>
    <content:SourceUrl>GOTS08E06.mp4</content:SourceUrl>
    <content:Resolution>1080p</content:Resolution>
    <content:Duration>PT1H15M20S</content:Duration>
    <content:Language>pol</content:Language>
    <content:Language>eng</content:Language>
  </Asset>
  <Asset xsi:type="content:PreviewType" uriId="GraoTronVIII_1_1080mp4">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIII_1_1080mp4</AlternateId>
    <content:SourceUrl>GOTS08E06_trailer.mp4</content:SourceUrl>
    <content:Resolution>1080p</content:Resolution>
    <content:Duration>PT0H01M48S</content:Duration>
    <content:Language>pol</content:Language>
    <content:Language>eng</content:Language>
  </Asset>
  <Asset xsi:type="content:PosterType" uriId="GraoTronVIIIPoster">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIIIPoster</AlternateId>
    <content:SourceUrl>GOTS08E06.jpg</content:SourceUrl>
    <content:X_Resolution>600</content:X_Resolution>
    <content:Y_Resolution>900</content:Y_Resolution>
    <content:Language>pol</content:Language>
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIII_0_1080mp4" />
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIII_1_1080mp4" />
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIIIPoster" />
  </Asset>
</ADI3>

我建议使用命名空间通配符,而不是剥离命名空间。在 Python 3.8.

中添加了对此的支持
from xml.etree import ElementTree as ET

tree = ET.parse(adiPath)

rating = tree.find(".//{*}Rating")  # Find the Rating element in any namespace
rating.text = "999"

请注意,您必须使用 find()(或 findall())。通配符不适用于 iter().


以下解决方法可用于在序列化 XML 文档时保留原始命名空间前缀(另请参阅 and )。

namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
    ET.register_namespace(ns, namespaces[ns])