python lxml 循环匹配获取下一个条目
python lxml loop on match get next entry
我正在使用 LXML 查询包含各种产品数据元素的多个 XML 文件。这部分代码正在获取缺失的 product_ids 列表并查询 XML 文件以获取产品的数据元素。
我的一个核心问题是,通过 xpath 获得的每个 product_id 都会针对列表中的每个项目进行检查 products_missing_from_postgresql,这需要永远(小时)
如何在找到匹配项后重新启动 for entry in entries 循环?
也许这不是正确的问题...如果不是正确的问题是什么?
# this code is for testing purposes
for product_number in products_missing_from_postgresql:
try:
for entry in entries:
product_id = entry.xpath('@id')[0]
if product_id != product_number:
print('************************')
print('current product: ' + product_id)
print('no match: ' + product_number)
print('************************')
else:
print('************************')
print('************************')
print('product to match: ' + product_number)
print('matched from entry: ' + product_id)
print('************************')
print('************************')
测试代码输出:
************************
************************
product to match: B3F2H-STH
matched from entry: B3F2H-STH
************************
************************
************************
current product: B3F2H-STL
no match: B3F2H-STH
************************
************************
current product: B3F2H-004
no match: B3F2H-STH
************************
此代码用于生产:
for product_number in products_missing_from_postgresql:
try:
for entry in entries:
product_id = entry.xpath('@id')[0]
if product_id != product_number:
# used for testing
print('no match: ' + product_number)
else:
# the element @id has multiple items linked that I need to acquire.
product_id = entry.xpath('@id')[0]
missing_products_to_add.append(product_id)
product_name = entry.xpath('@name')[0]
missing_products_to_add.append(product_name)
product_type = entry.xpath('@type')[0]
missing_products_to_add.append(product_type)
product_price = entry.xpath('@price')[0]
missing_products_to_add.append(product_price)
不使用内部 for
循环,而是使用 XPath。
for product_number in products_missing_from_postgresql:
entries = xml_tree.xpath("//entry[@id = '%s']" % product_number)
if entries:
print('FOUND: ' + product_number)
else:
print('NOT FOUND: ' + product_number)
如果您的 product_number
可以包含单引号,上面的代码将会中断。通常最好在 XPath 中使用占位符并单独传递实际值:
entries = xml_tree.xpath("//entry[@id = $value]", value=product_number)
尝试将您的 ID 放入 set
并与它进行一次比较 - 这将保存嵌套循环并且只执行一次 XPath,而不是继续重新查询树...
ids = {pid for entry in entries for pid in entry.xpath('@id')}
for product_number in products_missing_from_postgresql:
if product_number in ids:
# whatever
else:
# whatever
如果您还想检索元素,那么您可以构建字典而不是集合:
products = {p.attrib['id']: p for entry in entries for p in entry.xpath('//*[@id]')}
for product_number in products_missing_from_postgresql:
if product_number in products:
actual_product = products[product_number]
# ...
else:
# ...
我正在使用 LXML 查询包含各种产品数据元素的多个 XML 文件。这部分代码正在获取缺失的 product_ids 列表并查询 XML 文件以获取产品的数据元素。
我的一个核心问题是,通过 xpath 获得的每个 product_id 都会针对列表中的每个项目进行检查 products_missing_from_postgresql,这需要永远(小时)
如何在找到匹配项后重新启动 for entry in entries 循环?
也许这不是正确的问题...如果不是正确的问题是什么?
# this code is for testing purposes
for product_number in products_missing_from_postgresql:
try:
for entry in entries:
product_id = entry.xpath('@id')[0]
if product_id != product_number:
print('************************')
print('current product: ' + product_id)
print('no match: ' + product_number)
print('************************')
else:
print('************************')
print('************************')
print('product to match: ' + product_number)
print('matched from entry: ' + product_id)
print('************************')
print('************************')
测试代码输出:
************************
************************
product to match: B3F2H-STH
matched from entry: B3F2H-STH
************************
************************
************************
current product: B3F2H-STL
no match: B3F2H-STH
************************
************************
current product: B3F2H-004
no match: B3F2H-STH
************************
此代码用于生产:
for product_number in products_missing_from_postgresql:
try:
for entry in entries:
product_id = entry.xpath('@id')[0]
if product_id != product_number:
# used for testing
print('no match: ' + product_number)
else:
# the element @id has multiple items linked that I need to acquire.
product_id = entry.xpath('@id')[0]
missing_products_to_add.append(product_id)
product_name = entry.xpath('@name')[0]
missing_products_to_add.append(product_name)
product_type = entry.xpath('@type')[0]
missing_products_to_add.append(product_type)
product_price = entry.xpath('@price')[0]
missing_products_to_add.append(product_price)
不使用内部 for
循环,而是使用 XPath。
for product_number in products_missing_from_postgresql:
entries = xml_tree.xpath("//entry[@id = '%s']" % product_number)
if entries:
print('FOUND: ' + product_number)
else:
print('NOT FOUND: ' + product_number)
如果您的 product_number
可以包含单引号,上面的代码将会中断。通常最好在 XPath 中使用占位符并单独传递实际值:
entries = xml_tree.xpath("//entry[@id = $value]", value=product_number)
尝试将您的 ID 放入 set
并与它进行一次比较 - 这将保存嵌套循环并且只执行一次 XPath,而不是继续重新查询树...
ids = {pid for entry in entries for pid in entry.xpath('@id')}
for product_number in products_missing_from_postgresql:
if product_number in ids:
# whatever
else:
# whatever
如果您还想检索元素,那么您可以构建字典而不是集合:
products = {p.attrib['id']: p for entry in entries for p in entry.xpath('//*[@id]')}
for product_number in products_missing_from_postgresql:
if product_number in products:
actual_product = products[product_number]
# ...
else:
# ...