etree 将节点属性插入过滤 children
etree insert node attribute into filtered children
我正在处理一个 xml 文件。我想创建一个输出作为元组列表以批量插入数据库。
我似乎无法解决的问题是将节点中的 @id 插入到 child 节点中的选定属性中。
这是我的示例文档。请注意,在我的真实文件中,每个级别都有更多需要过滤掉的属性。我创建了这个 XML 文件作为一个更有用的示例。
doc = """
<region id="5153419" name="North Shore" date="2019-02-15T00:00:00" >
<shire abbrevname="Manly Council" code="20019" website="http://" >
<location id="5178566" site="1" division="Dee Why" staff="3" >
<reference isbn="978-1-891830-75-4" rating="Mature (18+)" title="110 Per¢" author="Tony Consiglio"/>
<reference isbn="978-1-60309-2395" rating="Mature (16+)" title="American Elf 1999" author="James Kochalka" />
<reference isbn="978-1-891830-37-2" rating="Young Adult (13+)" title="The Barefoot Serpent (softcover)" author="Scott Morse" />
<reference isbn="978-1-891830-56-3" rating="Mature (16+)" title="Bighead" author="Jeffrey Brown" />
<reference isbn="978-1-891830-19-8" rating="Mature (18+)" title="Box Office Poison" author="Alex Robinson" />
</location>
<location id="5178568" site="2" division="Brookvale" staff="5">
<reference isbn="978-1-891830-37-2" rating="Young Adult (13+)" title="The Barefoot Serpent (softcover)" author="Scott Morse"/>
<reference isbn="978-1-936561-69-8" rating="Adults Only (18+)" title="Chester 5000 (Book 2)" author="Isabelle George" />
<reference isbn="978-1-891830-81-5" rating="Young Adult (13+)" title="Cry Yourself to Sleep" author="Jeremy Tinder" />
<reference isbn="978-1-891830-75-4" rating="Mature (18+)" title="110 Per¢" author="Tony Consiglio" />
<reference isbn="978-1-891830-77-8" rating="Mature (16+)" title="Every Girl is the End of the World for Me" author="Jeffrey Brown" />
<reference isbn="978-0-9585783-4-9" rating="Mature (18+)" title="From Hell" author="Alan Moore and Eddie Campbell" />
</location>
</shire>
</region>
"""
我想要的输出是
(位置 ID、isbn、标题)
[(5153419, 978-1-891830-75-4,110 Per¢),(5153419, 978-1-60309-2395, American Elf 1999).......(5178568,978-0-9585783-4-9,From Hell)]
尝试了很多方法getiterator,findall。只是找不到实现它的方法。
filter_reference = ['isbn', 'title']
output_list = []
for child in tree.findall('.//reference'):
for k,v in child.items():
if k in filter_reference:
output_list.append(v)
遍历子项并获取您需要的属性:
import xml.etree.ElementTree as et
doc = """
your doc
"""
root = et.fromstring(doc)
result = []
for shire in root:
for location in shire:
location_id = location.attrib.get('id')
for reference in location:
list_of_attribs = [reference.attrib.get(x) for x in filter_reference]
result.append((location_id, list_of_attribs))
print(result) # [('5178566', '978-1-891830-75-4', '110 Per¢'), ('5178566', '978-1-60309-2395', 'American Elf 1999'), ('5178566', '978-1-891830-37-2', 'The Barefoot Serpent (softcover)'), ('5178566', '978-1-891830-56-3', 'Bighead'), ('5178566', '978-1-891830-19-8', 'Box Office Poison'), ('5178568', '978-1-891830-37-2', 'The Barefoot Serpent (softcover)'), ('5178568', '978-1-936561-69-8', 'Chester 5000 (Book 2)'), ('5178568', '978-1-891830-81-5', 'Cry Yourself to Sleep'), ('5178568', '978-1-891830-75-4', '110 Per¢'), ('5178568', '978-1-891830-77-8', 'Every Girl is the End of the World for Me'), ('5178568', '978-0-9585783-4-9', 'From Hell')]
我正在处理一个 xml 文件。我想创建一个输出作为元组列表以批量插入数据库。
我似乎无法解决的问题是将节点中的 @id 插入到 child 节点中的选定属性中。
这是我的示例文档。请注意,在我的真实文件中,每个级别都有更多需要过滤掉的属性。我创建了这个 XML 文件作为一个更有用的示例。
doc = """
<region id="5153419" name="North Shore" date="2019-02-15T00:00:00" >
<shire abbrevname="Manly Council" code="20019" website="http://" >
<location id="5178566" site="1" division="Dee Why" staff="3" >
<reference isbn="978-1-891830-75-4" rating="Mature (18+)" title="110 Per¢" author="Tony Consiglio"/>
<reference isbn="978-1-60309-2395" rating="Mature (16+)" title="American Elf 1999" author="James Kochalka" />
<reference isbn="978-1-891830-37-2" rating="Young Adult (13+)" title="The Barefoot Serpent (softcover)" author="Scott Morse" />
<reference isbn="978-1-891830-56-3" rating="Mature (16+)" title="Bighead" author="Jeffrey Brown" />
<reference isbn="978-1-891830-19-8" rating="Mature (18+)" title="Box Office Poison" author="Alex Robinson" />
</location>
<location id="5178568" site="2" division="Brookvale" staff="5">
<reference isbn="978-1-891830-37-2" rating="Young Adult (13+)" title="The Barefoot Serpent (softcover)" author="Scott Morse"/>
<reference isbn="978-1-936561-69-8" rating="Adults Only (18+)" title="Chester 5000 (Book 2)" author="Isabelle George" />
<reference isbn="978-1-891830-81-5" rating="Young Adult (13+)" title="Cry Yourself to Sleep" author="Jeremy Tinder" />
<reference isbn="978-1-891830-75-4" rating="Mature (18+)" title="110 Per¢" author="Tony Consiglio" />
<reference isbn="978-1-891830-77-8" rating="Mature (16+)" title="Every Girl is the End of the World for Me" author="Jeffrey Brown" />
<reference isbn="978-0-9585783-4-9" rating="Mature (18+)" title="From Hell" author="Alan Moore and Eddie Campbell" />
</location>
</shire>
</region>
"""
我想要的输出是
(位置 ID、isbn、标题)
[(5153419, 978-1-891830-75-4,110 Per¢),(5153419, 978-1-60309-2395, American Elf 1999).......(5178568,978-0-9585783-4-9,From Hell)]
尝试了很多方法getiterator,findall。只是找不到实现它的方法。
filter_reference = ['isbn', 'title']
output_list = []
for child in tree.findall('.//reference'):
for k,v in child.items():
if k in filter_reference:
output_list.append(v)
遍历子项并获取您需要的属性:
import xml.etree.ElementTree as et
doc = """
your doc
"""
root = et.fromstring(doc)
result = []
for shire in root:
for location in shire:
location_id = location.attrib.get('id')
for reference in location:
list_of_attribs = [reference.attrib.get(x) for x in filter_reference]
result.append((location_id, list_of_attribs))
print(result) # [('5178566', '978-1-891830-75-4', '110 Per¢'), ('5178566', '978-1-60309-2395', 'American Elf 1999'), ('5178566', '978-1-891830-37-2', 'The Barefoot Serpent (softcover)'), ('5178566', '978-1-891830-56-3', 'Bighead'), ('5178566', '978-1-891830-19-8', 'Box Office Poison'), ('5178568', '978-1-891830-37-2', 'The Barefoot Serpent (softcover)'), ('5178568', '978-1-936561-69-8', 'Chester 5000 (Book 2)'), ('5178568', '978-1-891830-81-5', 'Cry Yourself to Sleep'), ('5178568', '978-1-891830-75-4', '110 Per¢'), ('5178568', '978-1-891830-77-8', 'Every Girl is the End of the World for Me'), ('5178568', '978-0-9585783-4-9', 'From Hell')]