python lxml 循环匹配获取下一个条目

Question

我正在使用 LXML 查询包含各种产品数据元素的多个 XML 文件。这部分代码正在获取缺失的 product_ids 列表并查询 XML 文件以获取产品的数据元素。

我的一个核心问题是，通过 xpath 获得的每个 product_id 都会针对列表中的每个项目进行检查 products_missing_from_postgresql，这需要永远（小时）

如何在找到匹配项后重新启动 for entry in entries 循环？

也许这不是正确的问题...如果不是正确的问题是什么？

# this code is for testing purposes 
for product_number in products_missing_from_postgresql:
try:
    for entry in entries:

       product_id = entry.xpath('@id')[0]

        if product_id != product_number:

            print('************************')
            print('current product: ' + product_id)
            print('no match: ' + product_number)
            print('************************')

        else:

            print('************************')
            print('************************')
            print('product to match: ' + product_number)
            print('matched from entry: ' + product_id)
            print('************************')
            print('************************')

测试代码输出：

************************
************************
product to match: B3F2H-STH 
matched from entry: B3F2H-STH 
************************
************************

************************
current product: B3F2H-STL
no match: B3F2H-STH 
************************

************************
current product: B3F2H-004 
no match: B3F2H-STH 
************************

此代码用于生产：

for product_number in products_missing_from_postgresql:

try:
for entry in entries:

    product_id = entry.xpath('@id')[0]

    if product_id != product_number:

        # used for testing
        print('no match: ' + product_number)

    else:
       # the element @id has multiple items linked that I need to acquire. 

       product_id = entry.xpath('@id')[0]
       missing_products_to_add.append(product_id)

       product_name = entry.xpath('@name')[0]
       missing_products_to_add.append(product_name)

       product_type = entry.xpath('@type')[0]
       missing_products_to_add.append(product_type)

       product_price = entry.xpath('@price')[0]
       missing_products_to_add.append(product_price)

Answer 1

不使用内部 for 循环，而是使用 XPath。

for product_number in products_missing_from_postgresql:
    entries = xml_tree.xpath("//entry[@id = '%s']" % product_number)
    if entries:
        print('FOUND: ' + product_number)
    else:
        print('NOT FOUND: ' + product_number)

如果您的 product_number 可以包含单引号，上面的代码将会中断。通常最好在 XPath 中使用占位符并单独传递实际值：

    entries = xml_tree.xpath("//entry[@id = $value]", value=product_number)

Answer 2

尝试将您的 ID 放入 set 并与它进行一次比较 - 这将保存嵌套循环并且只执行一次 XPath，而不是继续重新查询树...

ids = {pid for entry in entries for pid in entry.xpath('@id')}
for product_number in products_missing_from_postgresql:
    if product_number in ids:
        # whatever
    else:
        # whatever

如果您还想检索元素，那么您可以构建字典而不是集合：

products = {p.attrib['id']: p for entry in entries for p in entry.xpath('//*[@id]')}
for product_number in products_missing_from_postgresql:
    if product_number in products:
        actual_product = products[product_number]
        # ...
    else:
        # ...

python lxml 循环匹配获取下一个条目

python lxml loop on match get next entry

python

lxml

list