从 Python 中损坏的 xml 获取数据
Getting data from broken xml in Python
我想从 xml 获取数据,但它的结构似乎被破坏了。
我有这个例子URL:https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/73478
xml 包含有关产品的数据。
import requests
import json
from xml.etree import ElementTree
from pprint import pprint
response = requests.get(
"https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/86559",
headers={"Accept": "application/xml"},
)
node = ElementTree.fromstring(response.content)
data = json.loads(node.text)
这个 returns 有四个键的字典:
{'jsonChildsConfig': '{"70259":{"id":"70259","name":"Ski Ultra Merino E - '
'black\/orange","sku":"610306139887","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"36-39 '
'","salable":true},"70260":{"id":"70260","name":"Ski '
'Ultra Merino E - '
'black\/orange","sku":"610306139894","availableQty":7,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"40-43 '
'","salable":true},"70261":{"id":"70261","name":"Ski '
'Ultra Merino E - '
'black\/orange","sku":"610306139900","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"44-47 '
'","salable":true},"99060":{"id":"99060","name":"Ski '
'Ultra Merino E - '
'black\/orange","sku":"610306139917","availableQty":3,"regularPrice":69.24,"finalPrice":69.24,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"48+ '
'","salable":true}}',
'jsonConfig': 'some data',
'jsonDefaultPlaceholder': 'https://b2b.snapoutdoor.pl/pub/media/catalog/product/placeholder/',
'jsonSwatchConfig': 'some data'
}
我对 jsonChildsConfig
的值很感兴趣,但是在尝试访问其中的键时,我得到了
TypeError: string indices must be integers
因为 jsonChildsConfig
的值是一个字符串。
我想从 sku
和 availableQty
获取所有 sku 和库存值,但他们的类型是字符串,无法通过
获取
data['jsonChildsConfig']['70259']['sku']
或
data['jsonChildsConfig']['70259']['availableQty']
.
我也尝试将此字符串转换为 json byt json.loads()
但它没有用。
你能帮我解决一下吗?
使用 json.loads 将数据 ['jsonChildsConfig'] 的值转换为字典应该可行
>>> childConfigDetails = json.loads(data['jsonChildsConfig'])
>>> childConfigDetails['70259']['sku']
'610306139887'
要修复您的字典,您需要将 json.loads
应用于字典的所有值,不包括 'jsonDefaultPlaceholder'
,它不是 json 格式:
del data['jsonDefaultPlaceholder']
new_data = {k: json.loads(v) for k, v in data.items() if v}
new_data['jsonChildsConfig']['70259']['sku']
#output: '610306139887'
或者如果您想将您感兴趣的键转换为整数值:
del data['jsonDefaultPlaceholder']
new_data2 = {k: {(int(key) if key.isdigit() else key): val for key,val in json.loads(v).items()} for k, v in data.items() if v}
new_data2['jsonChildsConfig'][70259]['sku']
# output: '610306139887'
我想从 xml 获取数据,但它的结构似乎被破坏了。
我有这个例子URL:https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/73478
xml 包含有关产品的数据。
import requests
import json
from xml.etree import ElementTree
from pprint import pprint
response = requests.get(
"https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/86559",
headers={"Accept": "application/xml"},
)
node = ElementTree.fromstring(response.content)
data = json.loads(node.text)
这个 returns 有四个键的字典:
{'jsonChildsConfig': '{"70259":{"id":"70259","name":"Ski Ultra Merino E - '
'black\/orange","sku":"610306139887","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"36-39 '
'","salable":true},"70260":{"id":"70260","name":"Ski '
'Ultra Merino E - '
'black\/orange","sku":"610306139894","availableQty":7,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"40-43 '
'","salable":true},"70261":{"id":"70261","name":"Ski '
'Ultra Merino E - '
'black\/orange","sku":"610306139900","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"44-47 '
'","salable":true},"99060":{"id":"99060","name":"Ski '
'Ultra Merino E - '
'black\/orange","sku":"610306139917","availableQty":3,"regularPrice":69.24,"finalPrice":69.24,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\/\/b2b.snapoutdoor.pl\/checkout\/cart\/add\/uenc\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\/product\/86559\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"48+ '
'","salable":true}}',
'jsonConfig': 'some data',
'jsonDefaultPlaceholder': 'https://b2b.snapoutdoor.pl/pub/media/catalog/product/placeholder/',
'jsonSwatchConfig': 'some data'
}
我对 jsonChildsConfig
的值很感兴趣,但是在尝试访问其中的键时,我得到了
TypeError: string indices must be integers
因为 jsonChildsConfig
的值是一个字符串。
我想从 sku
和 availableQty
获取所有 sku 和库存值,但他们的类型是字符串,无法通过
data['jsonChildsConfig']['70259']['sku']
或
data['jsonChildsConfig']['70259']['availableQty']
.
我也尝试将此字符串转换为 json byt json.loads()
但它没有用。
你能帮我解决一下吗?
使用 json.loads 将数据 ['jsonChildsConfig'] 的值转换为字典应该可行
>>> childConfigDetails = json.loads(data['jsonChildsConfig'])
>>> childConfigDetails['70259']['sku']
'610306139887'
要修复您的字典,您需要将 json.loads
应用于字典的所有值,不包括 'jsonDefaultPlaceholder'
,它不是 json 格式:
del data['jsonDefaultPlaceholder']
new_data = {k: json.loads(v) for k, v in data.items() if v}
new_data['jsonChildsConfig']['70259']['sku']
#output: '610306139887'
或者如果您想将您感兴趣的键转换为整数值:
del data['jsonDefaultPlaceholder']
new_data2 = {k: {(int(key) if key.isdigit() else key): val for key,val in json.loads(v).items()} for k, v in data.items() if v}
new_data2['jsonChildsConfig'][70259]['sku']
# output: '610306139887'