Python lxml - 规范化字典中的所有子元素
Python lxml - Normalize all elements subchildren in dictionary
我正在尝试解析安全 OVAL XML 定义文件以自动执行测试。
我想要实现的是对于每个定义,转换字典中的测试标准和标准。
条件 XML 结构如下所示:
<criteria operator="AND">
<criteria comment="Affected IOSXE configuration" operator="AND">
<criterion comment="ASR 1000 series router" test_ref="oval:org.cisecurity:tst:5943" />
<criteria comment="Affected IOSXE configuration" operator="OR">
<criteria comment="Zone-based firewall configured" operator="AND">
<criterion comment="Match TCP or UDP" test_ref="oval:org.cisecurity:tst:6071" />
<criterion comment="ZBFW inspection enabled" test_ref="oval:org.cisecurity:tst:5850" />
</criteria>
<criteria comment="NAT and PPTP ALG are enabled" operator="AND">
<criterion comment="NAT configured" test_ref="oval:org.cisecurity:tst:6020" />
<criterion comment="NAT enabled" test_ref="oval:org.cisecurity:tst:6146" />
<criterion comment="PPTP ALG disabled" negate="true" test_ref="oval:org.cisecurity:tst:5668" />
</criteria>
<criteria comment="NAT and TCP reassembly are enabled" operator="AND">
<criterion comment="NAT configured" test_ref="oval:org.cisecurity:tst:6020" />
<criterion comment="NAT enabled" test_ref="oval:org.cisecurity:tst:6146" />
<criterion comment="Affected processor" test_ref="oval:org.cisecurity:tst:5622" />
</criteria>
<criterion comment="EoGRE is enabled" test_ref="oval:org.cisecurity:tst:6003" />
</criteria>
</criteria>
<criterion comment="IOSXE version is affected" test_ref="oval:org.cisecurity:tst:6178" />
</criteria>
我可以使用以下代码检索和映射第一级标准:
# Add OVAL ID attrib in normalized Vulnerability dictionary
for idx, vuln in enumerate(vuln_list):
vuln['oval_id'] = root.xpath("//ns:definition", namespaces=ns)[idx].attrib['id']
criteria = root.xpath("//ns:definition[@id='" + vuln_list[idx]['oval_id'] + "']/ns:criteria/*", namespaces=ns)
vuln['criteria'] = [crit.items() for crit in criteria]
这会用以下结果填充我的字典,显然缺少嵌套的子元素:
{'cisco_adv_id': 'cisco-sa-20131030-asr1000',
'cisco_adv_url': 'http://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20131030-asr1000',
'criteria': [[('comment', 'Affected IOSXE configuration'),
('operator', 'AND')],
[('comment', 'IOSXE version is affected'),
('test_ref', 'oval:org.cisecurity:tst:6178')]],
'cve_id': 'CVE-2013-5547',
'oval_id': 'oval:org.cisecurity:def:4321',
'title': 'Cisco IOS XE Software Malformed EoGRE Packet Denial of Service '
'Vulnerability'},
我可以检查嵌套的 for 循环并使用 getchildren() 检查元素是否有子元素,但这听起来不像是最佳解决方案,因为每个定义都有一个或多个 criteria/criterion 元素。
关于如何更有效地解析它的任何想法?
提前致谢。
如果你使用递归就相对容易了。
对于第一个示例,我尝试保持与您相同的组织方式:每个条件都是一个包含属性和子项的列表,但都存储为字典而不是元组
def get_data(el):
if el.tag =='criteria':
data = {'criteria': [el.attrib]}
for desc in el.iterchildren():
data['criteria'].append(get_data(desc))
return data
else:
return {'criterion': el.attrib}
问题是返回的数据不容易使用:每个条件最多可以包含三个字典(属性、条件或条件),您必须进行一些测试才能知道哪个是哪个。在第二个示例中,您事先知道列表包含什么:如果键是条件,您知道您将拥有一个条件字典列表。
def get_data(el):
if el.tag =='criteria':
data = {}
data.update(el.attrib)
for desc in el.iterchildren():
key = desc.tag
if not key in data:
data[key] = []
data[key].append(get_data(desc))
return data
else:
return el.attrib
我正在尝试解析安全 OVAL XML 定义文件以自动执行测试。
我想要实现的是对于每个定义,转换字典中的测试标准和标准。
条件 XML 结构如下所示:
<criteria operator="AND">
<criteria comment="Affected IOSXE configuration" operator="AND">
<criterion comment="ASR 1000 series router" test_ref="oval:org.cisecurity:tst:5943" />
<criteria comment="Affected IOSXE configuration" operator="OR">
<criteria comment="Zone-based firewall configured" operator="AND">
<criterion comment="Match TCP or UDP" test_ref="oval:org.cisecurity:tst:6071" />
<criterion comment="ZBFW inspection enabled" test_ref="oval:org.cisecurity:tst:5850" />
</criteria>
<criteria comment="NAT and PPTP ALG are enabled" operator="AND">
<criterion comment="NAT configured" test_ref="oval:org.cisecurity:tst:6020" />
<criterion comment="NAT enabled" test_ref="oval:org.cisecurity:tst:6146" />
<criterion comment="PPTP ALG disabled" negate="true" test_ref="oval:org.cisecurity:tst:5668" />
</criteria>
<criteria comment="NAT and TCP reassembly are enabled" operator="AND">
<criterion comment="NAT configured" test_ref="oval:org.cisecurity:tst:6020" />
<criterion comment="NAT enabled" test_ref="oval:org.cisecurity:tst:6146" />
<criterion comment="Affected processor" test_ref="oval:org.cisecurity:tst:5622" />
</criteria>
<criterion comment="EoGRE is enabled" test_ref="oval:org.cisecurity:tst:6003" />
</criteria>
</criteria>
<criterion comment="IOSXE version is affected" test_ref="oval:org.cisecurity:tst:6178" />
</criteria>
我可以使用以下代码检索和映射第一级标准:
# Add OVAL ID attrib in normalized Vulnerability dictionary
for idx, vuln in enumerate(vuln_list):
vuln['oval_id'] = root.xpath("//ns:definition", namespaces=ns)[idx].attrib['id']
criteria = root.xpath("//ns:definition[@id='" + vuln_list[idx]['oval_id'] + "']/ns:criteria/*", namespaces=ns)
vuln['criteria'] = [crit.items() for crit in criteria]
这会用以下结果填充我的字典,显然缺少嵌套的子元素:
{'cisco_adv_id': 'cisco-sa-20131030-asr1000',
'cisco_adv_url': 'http://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20131030-asr1000',
'criteria': [[('comment', 'Affected IOSXE configuration'),
('operator', 'AND')],
[('comment', 'IOSXE version is affected'),
('test_ref', 'oval:org.cisecurity:tst:6178')]],
'cve_id': 'CVE-2013-5547',
'oval_id': 'oval:org.cisecurity:def:4321',
'title': 'Cisco IOS XE Software Malformed EoGRE Packet Denial of Service '
'Vulnerability'},
我可以检查嵌套的 for 循环并使用 getchildren() 检查元素是否有子元素,但这听起来不像是最佳解决方案,因为每个定义都有一个或多个 criteria/criterion 元素。
关于如何更有效地解析它的任何想法?
提前致谢。
如果你使用递归就相对容易了。
对于第一个示例,我尝试保持与您相同的组织方式:每个条件都是一个包含属性和子项的列表,但都存储为字典而不是元组
def get_data(el):
if el.tag =='criteria':
data = {'criteria': [el.attrib]}
for desc in el.iterchildren():
data['criteria'].append(get_data(desc))
return data
else:
return {'criterion': el.attrib}
问题是返回的数据不容易使用:每个条件最多可以包含三个字典(属性、条件或条件),您必须进行一些测试才能知道哪个是哪个。在第二个示例中,您事先知道列表包含什么:如果键是条件,您知道您将拥有一个条件字典列表。
def get_data(el):
if el.tag =='criteria':
data = {}
data.update(el.attrib)
for desc in el.iterchildren():
key = desc.tag
if not key in data:
data[key] = []
data[key].append(get_data(desc))
return data
else:
return el.attrib