解析大型 xml 文件时内存已满和其他问题

Question

我有一个 XML 跟踪文件，该文件的大小约为 350 Mb。当我使用下面的代码时，一次它会产生内存已满的问题，而另一次它会产生一个关于无法解析文件的错误。解析这么大的文件应该怎么办？要用别的方法解析吗？

    root = ET.parse('E:/software/jm_16.1/bin/tracefile.xml').getroot()
    lst = root.findall('AVCTrace/Picture/SubPicture/Slice/MacroBlock')
    for item in lst:
        print (item.get('QP_Y'))

I also produce a smaller file and based on the above file and the variable `lst` is empty!!. do you know what is the problem?
my XML trace file is as follows:

    I also need to extract X tag and Y tag in Macroblock. for this I used <MacroBlock num="8158">
                <SubMacroBlock num="0">
                    <Type>1</Type>
                    <TypeString>B_L0_8x8</TypeString>
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>-1</X>
                            <Y>-2</Y>
                        </Difference>
                        <Absolute>
                            <X>-4</X>
                            <Y>-6</Y>
                        </Absolute>
                    </MotionVector>
                </SubMacroBlock>

Answer 1

import xml.etree.ElementTree as ET


root = ET.parse('68071609.xml').getroot()
print(root.tag)  # Picture
elems = root.findall('SubPicture/Slice/MacroBlock/QP_Y')
for elem in elems:
    print(elem.text)  # 28

你的树的根已经是 Picture，所以你不应该在里面搜索 Picture/...。
您可以直接搜索所有名为 QP_Y 的节点，方法是将其添加到您的搜索路径。

如果您更喜欢遍历宏块，并让它们 QP_Y :

elems = root.findall('SubPicture/Slice/MacroBlock')
for elem in elems:
    print(elem.attrib)  # {'num': '0'}
    qp_y = next(child for child in elem if child.tag == "QP_Y").text  # will throw StopIteration if missing
    print(qp_y)  # 28

解析大型 xml 文件时内存已满和其他问题

Memory full and other problems parsing large size xml file

python

xml-parsing

python-3.x