使用 Python lxml 收集子标签的值

Collect values of child tag using Python lxml

我在 Python 2.6 中使用 lxml 库从 xml 文件中提取数据。在文档中我有很多 <Employee> 标签。我遍历每个 <Employee> 标签,创建我的 Employee class 的新实例,并使用 Employee 标签的值设置其成员变量。

    read_CA_tree = etree.parse(xml_tree, parser)
    all_employees = []
    for employee_tag in read_CA_tree.iter("Employee"):
        employee = Employee(employee_tag)
        all_employees.append(employee)

<Employee> 标签还可以有一个或多个 <EmailAddress> 子标签,如下所示:

<Employee ID="124" Name="Foo Bar" Title="Baz">
   <EmailAddress ID="124" Address="foobar@fizzbang.com" />
 </Employee>

我的 Employee 对象是通过 lxml 的 Element 调用 get() 方法

实例化的
class Employee(object):

    def __init__(self, employee_tag):
        self.Employee_ID = employee_tag.get("EmployeeID")
        self.First_Name = employee_tag.get("FirstName")
        self.Email_Addresses = self._collect_email(read_CA_tree, "EmailAddress")

    def _collect_emails(self,tree,tag):
        known_addr = []
        for i in tree.iter(tag):
            known_addr.append(i)
        return known_addr

对于每个 Employee 标签,我如何在子 <EmailAddress> 标签中收集 Address 的值并将电子邮件地址列表添加到我的 Employee class构造函数?

From the dox:

Elements carry attributes as a dict

所以,你可以试试:

def _collect_emails(self,tree,tag):
    known_addr = []
    email_addr = []
    for i in tree.iter(tag):
        known_addr.append(i)
        email_addr.append(i.get('Address', '')
    return known_addr