解析大型 xml 文件时内存已满和其他问题
Memory full and other problems parsing large size xml file
我有一个 XML 跟踪文件,该文件的大小约为 350 Mb。当我使用下面的代码时,一次它会产生内存已满的问题,而另一次它会产生一个关于无法解析文件的错误。解析这么大的文件应该怎么办?要用别的方法解析吗?
root = ET.parse('E:/software/jm_16.1/bin/tracefile.xml').getroot()
lst = root.findall('AVCTrace/Picture/SubPicture/Slice/MacroBlock')
for item in lst:
print (item.get('QP_Y'))
I also produce a smaller file and based on the above file and the variable `lst` is empty!!. do you know what is the problem?
my XML trace file is as follows:
I also need to extract X tag and Y tag in Macroblock. for this I used <MacroBlock num="8158">
<SubMacroBlock num="0">
<Type>1</Type>
<TypeString>B_L0_8x8</TypeString>
<MotionVector list="0">
<RefIdx>0</RefIdx>
<Difference>
<X>-1</X>
<Y>-2</Y>
</Difference>
<Absolute>
<X>-4</X>
<Y>-6</Y>
</Absolute>
</MotionVector>
</SubMacroBlock>
import xml.etree.ElementTree as ET
root = ET.parse('68071609.xml').getroot()
print(root.tag) # Picture
elems = root.findall('SubPicture/Slice/MacroBlock/QP_Y')
for elem in elems:
print(elem.text) # 28
你的树的根已经是 Picture
,所以你不应该在里面搜索 Picture/...
。
您可以直接搜索所有名为 QP_Y
的节点,方法是将其添加到您的搜索路径。
如果您更喜欢遍历宏块,并让它们 QP_Y :
elems = root.findall('SubPicture/Slice/MacroBlock')
for elem in elems:
print(elem.attrib) # {'num': '0'}
qp_y = next(child for child in elem if child.tag == "QP_Y").text # will throw StopIteration if missing
print(qp_y) # 28
我有一个 XML 跟踪文件,该文件的大小约为 350 Mb。当我使用下面的代码时,一次它会产生内存已满的问题,而另一次它会产生一个关于无法解析文件的错误。解析这么大的文件应该怎么办?要用别的方法解析吗?
root = ET.parse('E:/software/jm_16.1/bin/tracefile.xml').getroot()
lst = root.findall('AVCTrace/Picture/SubPicture/Slice/MacroBlock')
for item in lst:
print (item.get('QP_Y'))
I also produce a smaller file and based on the above file and the variable `lst` is empty!!. do you know what is the problem?
my XML trace file is as follows:
I also need to extract X tag and Y tag in Macroblock. for this I used <MacroBlock num="8158">
<SubMacroBlock num="0">
<Type>1</Type>
<TypeString>B_L0_8x8</TypeString>
<MotionVector list="0">
<RefIdx>0</RefIdx>
<Difference>
<X>-1</X>
<Y>-2</Y>
</Difference>
<Absolute>
<X>-4</X>
<Y>-6</Y>
</Absolute>
</MotionVector>
</SubMacroBlock>
import xml.etree.ElementTree as ET
root = ET.parse('68071609.xml').getroot()
print(root.tag) # Picture
elems = root.findall('SubPicture/Slice/MacroBlock/QP_Y')
for elem in elems:
print(elem.text) # 28
你的树的根已经是 Picture
,所以你不应该在里面搜索 Picture/...
。
您可以直接搜索所有名为 QP_Y
的节点,方法是将其添加到您的搜索路径。
如果您更喜欢遍历宏块,并让它们 QP_Y :
elems = root.findall('SubPicture/Slice/MacroBlock')
for elem in elems:
print(elem.attrib) # {'num': '0'}
qp_y = next(child for child in elem if child.tag == "QP_Y").text # will throw StopIteration if missing
print(qp_y) # 28