从 Python 中损坏的 xml 获取数据

Getting data from broken xml in Python

我想从 xml 获取数据,但它的结构似乎被破坏了。

我有这个例子URL:https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/73478 xml 包含有关产品的数据。

import requests
import json
from xml.etree import ElementTree
from pprint import pprint

response = requests.get(
    "https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/86559",
    headers={"Accept": "application/xml"},
)

node = ElementTree.fromstring(response.content)

data = json.loads(node.text)

这个 returns 有四个键的字典:

{'jsonChildsConfig': '{"70259":{"id":"70259","name":"Ski Ultra Merino E - '
                     'black\/orange","sku":"610306139887","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"36-39 '
                     '","salable":true},"70260":{"id":"70260","name":"Ski '
                     'Ultra Merino E - '
                     'black\/orange","sku":"610306139894","availableQty":7,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"40-43 '
                     '","salable":true},"70261":{"id":"70261","name":"Ski '
                     'Ultra Merino E - '
                     'black\/orange","sku":"610306139900","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"44-47 '
                     '","salable":true},"99060":{"id":"99060","name":"Ski '
                     'Ultra Merino E - '
                     'black\/orange","sku":"610306139917","availableQty":3,"regularPrice":69.24,"finalPrice":69.24,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"48+ '
                     '","salable":true}}',
 'jsonConfig': 'some data',
 'jsonDefaultPlaceholder': 'https://b2b.snapoutdoor.pl/pub/media/catalog/product/placeholder/',
 'jsonSwatchConfig': 'some data'
}

我对 jsonChildsConfig 的值很感兴趣,但是在尝试访问其中的键时,我得到了 TypeError: string indices must be integers 因为 jsonChildsConfig 的值是一个字符串。

我想从 skuavailableQty 获取所有 sku 和库存值,但他们的类型是字符串,无法通过

获取

data['jsonChildsConfig']['70259']['sku']

data['jsonChildsConfig']['70259']['availableQty'].

我也尝试将此字符串转换为 json byt json.loads() 但它没有用。

你能帮我解决一下吗?

使用 json.loads 将数据 ['jsonChildsConfig'] 的值转换为字典应该可行

>>> childConfigDetails = json.loads(data['jsonChildsConfig'])
>>> childConfigDetails['70259']['sku']
'610306139887'

要修复您的字典,您需要将 json.loads 应用于字典的所有值,不包括 'jsonDefaultPlaceholder' ,它不是 json 格式:

del data['jsonDefaultPlaceholder']
new_data = {k: json.loads(v) for k, v in data.items() if v}
new_data['jsonChildsConfig']['70259']['sku']

#output: '610306139887'

或者如果您想将您感兴趣的键转换为整数值:

del data['jsonDefaultPlaceholder']
new_data2 = {k: {(int(key) if key.isdigit() else key): val for key,val in json.loads(v).items()} for k, v in data.items() if v}
new_data2['jsonChildsConfig'][70259]['sku']

# output: '610306139887'