使用 Python lxml 收集子标签的值
Collect values of child tag using Python lxml
我在 Python 2.6 中使用 lxml
库从 xml 文件中提取数据。在文档中我有很多 <Employee>
标签。我遍历每个 <Employee>
标签,创建我的 Employee
class 的新实例,并使用 Employee
标签的值设置其成员变量。
read_CA_tree = etree.parse(xml_tree, parser)
all_employees = []
for employee_tag in read_CA_tree.iter("Employee"):
employee = Employee(employee_tag)
all_employees.append(employee)
<Employee>
标签还可以有一个或多个 <EmailAddress>
子标签,如下所示:
<Employee ID="124" Name="Foo Bar" Title="Baz">
<EmailAddress ID="124" Address="foobar@fizzbang.com" />
</Employee>
我的 Employee 对象是通过 lxml 的 Element
调用 get()
方法
实例化的
class Employee(object):
def __init__(self, employee_tag):
self.Employee_ID = employee_tag.get("EmployeeID")
self.First_Name = employee_tag.get("FirstName")
self.Email_Addresses = self._collect_email(read_CA_tree, "EmailAddress")
def _collect_emails(self,tree,tag):
known_addr = []
for i in tree.iter(tag):
known_addr.append(i)
return known_addr
对于每个 Employee
标签,我如何在子 <EmailAddress>
标签中收集 Address
的值并将电子邮件地址列表添加到我的 Employee
class构造函数?
Elements carry attributes as a dict
所以,你可以试试:
def _collect_emails(self,tree,tag):
known_addr = []
email_addr = []
for i in tree.iter(tag):
known_addr.append(i)
email_addr.append(i.get('Address', '')
return known_addr
我在 Python 2.6 中使用 lxml
库从 xml 文件中提取数据。在文档中我有很多 <Employee>
标签。我遍历每个 <Employee>
标签,创建我的 Employee
class 的新实例,并使用 Employee
标签的值设置其成员变量。
read_CA_tree = etree.parse(xml_tree, parser)
all_employees = []
for employee_tag in read_CA_tree.iter("Employee"):
employee = Employee(employee_tag)
all_employees.append(employee)
<Employee>
标签还可以有一个或多个 <EmailAddress>
子标签,如下所示:
<Employee ID="124" Name="Foo Bar" Title="Baz">
<EmailAddress ID="124" Address="foobar@fizzbang.com" />
</Employee>
我的 Employee 对象是通过 lxml 的 Element
调用 get()
方法
class Employee(object):
def __init__(self, employee_tag):
self.Employee_ID = employee_tag.get("EmployeeID")
self.First_Name = employee_tag.get("FirstName")
self.Email_Addresses = self._collect_email(read_CA_tree, "EmailAddress")
def _collect_emails(self,tree,tag):
known_addr = []
for i in tree.iter(tag):
known_addr.append(i)
return known_addr
对于每个 Employee
标签,我如何在子 <EmailAddress>
标签中收集 Address
的值并将电子邮件地址列表添加到我的 Employee
class构造函数?
Elements carry attributes as a dict
所以,你可以试试:
def _collect_emails(self,tree,tag):
known_addr = []
email_addr = []
for i in tree.iter(tag):
known_addr.append(i)
email_addr.append(i.get('Address', '')
return known_addr