如何将这个复杂的嵌套字典转换为 Pandas
How do I convert this complex nested Dict into Pandas
我想将来自 API 的调用结果转换为数据框。 API 调用的结果是嵌套字典,但我无法生成数据框。
除了json_normalize,我还尝试了pd.DataFrame.from_dict。然而,直到现在都没有成功。我也试过压平字典,但是没有。
我使用了以下调用:
response = requests.request("GET", url, headers=headers, data=payload)
result = response.json()
输出为:
{'mlcSongCode': 'A6457V',
'primaryTitle': 'AIR FORCE ONES',
'membersSongId': '',
'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
'propertyId': None,
'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
'writers': [{'writerId': '1083561',
'writerLastName': 'SMITH',
'writerFirstName': 'PREMRO VONZELLAIRE',
'writerIPI': '00232478669',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535223',
'chainParentId': ''},
{'writerId': '1858916',
'writerLastName': 'GOODWIN',
'writerFirstName': 'MARLON',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535224',
'chainParentId': ''},
{'writerId': '1883205',
'writerLastName': 'HAYNES',
'writerFirstName': 'CORNELL',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535225',
'chainParentId': ''},
{'writerId': '4733138',
'writerLastName': 'LAVELLE',
'writerFirstName': 'CRUMP',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535226',
'chainParentId': ''}],
'publishers': [{'publisherId': '910354',
'mlcPublisherNumber': None,
'publisherName': 'TENYOR MUSIC',
'publisherIpiNumber': '00263286262',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 16.67,
'chainId': 'PSA_311720187',
'chainParentId': 'PSC_311915511',
'administrators': [],
'parentPublishers': [{'publisherId': '377508',
'mlcPublisherNumber': None,
'publisherName': 'ALL MY PUBLISHING LLC',
'publisherIpiNumber': '',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_311915511',
'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
'administrators': [],
'parentPublishers': []}]},
{'publisherId': '716372',
'mlcPublisherNumber': None,
'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
'publisherIpiNumber': '00503659557',
'publisherRoleCode': 'SubPublisher',
'collectionShare': 50,
'chainId': 'PSA_365023093',
'chainParentId': 'PSC_337535222',
'administrators': [],
'parentPublishers': [{'publisherId': '631204',
'mlcPublisherNumber': None,
'publisherName': 'TARPO MUSIC PUB.',
'publisherIpiNumber': '00419823444',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_337535222',
'chainParentId': '',
'administrators': [],
'parentPublishers': []}]}],
'iswc': ''}
然后为了生成数据框,我使用了以下代码:
df = pd.json_normalize(result)
# df = pd.read_json(result)
print(df)
但是出错了
AttributeError: module 'pandas' has no attribute 'json_normalize'
我的主要目标是将其转换为 Excel 或 CSV 格式,以便正确阅读。
从结果字典开始
result = {'mlcSongCode': 'A6457V',
'primaryTitle': 'AIR FORCE ONES',
'membersSongId': '',
'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
'propertyId': None,
'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
'writers': [{'writerId': '1083561',
'writerLastName': 'SMITH',
'writerFirstName': 'PREMRO VONZELLAIRE',
'writerIPI': '00232478669',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535223',
'chainParentId': ''},
{'writerId': '1858916',
'writerLastName': 'GOODWIN',
'writerFirstName': 'MARLON',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535224',
'chainParentId': ''},
{'writerId': '1883205',
'writerLastName': 'HAYNES',
'writerFirstName': 'CORNELL',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535225',
'chainParentId': ''},
{'writerId': '4733138',
'writerLastName': 'LAVELLE',
'writerFirstName': 'CRUMP',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535226',
'chainParentId': ''}],
'publishers': [{'publisherId': '910354',
'mlcPublisherNumber': None,
'publisherName': 'TENYOR MUSIC',
'publisherIpiNumber': '00263286262',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 16.67,
'chainId': 'PSA_311720187',
'chainParentId': 'PSC_311915511',
'administrators': [],
'parentPublishers': [{'publisherId': '377508',
'mlcPublisherNumber': None,
'publisherName': 'ALL MY PUBLISHING LLC',
'publisherIpiNumber': '',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_311915511',
'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
'administrators': [],
'parentPublishers': []}]},
{'publisherId': '716372',
'mlcPublisherNumber': None,
'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
'publisherIpiNumber': '00503659557',
'publisherRoleCode': 'SubPublisher',
'collectionShare': 50,
'chainId': 'PSA_365023093',
'chainParentId': 'PSC_337535222',
'administrators': [],
'parentPublishers': [{'publisherId': '631204',
'mlcPublisherNumber': None,
'publisherName': 'TARPO MUSIC PUB.',
'publisherIpiNumber': '00419823444',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_337535222',
'chainParentId': '',
'administrators': [],
'parentPublishers': []}]}],
'iswc': ''}
将其加载到数据框中:
import pandas as pd
df = pd.json_normalize(result)
这给出了一个数据框,结果的每个键作为一列,键的值作为列值。在这种情况下,列是 mlcSongCode primaryTitle membersSongId artists propertyId akas writers publishers iswc
展开writers
列:
df = df.explode('writers').reset_index(drop=True)
这会将 writers
数组中的每个元素转换为一行,为您提供一个数据框,每个 'writer'
有一行
将 writers
JSON 标准化为平坦 table。这需要每个 'writer' 的 JSON 并将它的每个键扩展到一个列中。例如。它将为 'writerLastName'、'writerFirstName' 等
生成一列
normalized = pd.json_normalize(df['writers'])
将规范化数据框加入原始数据框,并删除原始 'writers' 列:
df = df.join(normalized).drop(columns=['writers'])
然后根据需要重复其他 JSON 列
我想将来自 API 的调用结果转换为数据框。 API 调用的结果是嵌套字典,但我无法生成数据框。
除了json_normalize,我还尝试了pd.DataFrame.from_dict。然而,直到现在都没有成功。我也试过压平字典,但是没有。
我使用了以下调用:
response = requests.request("GET", url, headers=headers, data=payload)
result = response.json()
输出为:
{'mlcSongCode': 'A6457V',
'primaryTitle': 'AIR FORCE ONES',
'membersSongId': '',
'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
'propertyId': None,
'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
'writers': [{'writerId': '1083561',
'writerLastName': 'SMITH',
'writerFirstName': 'PREMRO VONZELLAIRE',
'writerIPI': '00232478669',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535223',
'chainParentId': ''},
{'writerId': '1858916',
'writerLastName': 'GOODWIN',
'writerFirstName': 'MARLON',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535224',
'chainParentId': ''},
{'writerId': '1883205',
'writerLastName': 'HAYNES',
'writerFirstName': 'CORNELL',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535225',
'chainParentId': ''},
{'writerId': '4733138',
'writerLastName': 'LAVELLE',
'writerFirstName': 'CRUMP',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535226',
'chainParentId': ''}],
'publishers': [{'publisherId': '910354',
'mlcPublisherNumber': None,
'publisherName': 'TENYOR MUSIC',
'publisherIpiNumber': '00263286262',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 16.67,
'chainId': 'PSA_311720187',
'chainParentId': 'PSC_311915511',
'administrators': [],
'parentPublishers': [{'publisherId': '377508',
'mlcPublisherNumber': None,
'publisherName': 'ALL MY PUBLISHING LLC',
'publisherIpiNumber': '',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_311915511',
'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
'administrators': [],
'parentPublishers': []}]},
{'publisherId': '716372',
'mlcPublisherNumber': None,
'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
'publisherIpiNumber': '00503659557',
'publisherRoleCode': 'SubPublisher',
'collectionShare': 50,
'chainId': 'PSA_365023093',
'chainParentId': 'PSC_337535222',
'administrators': [],
'parentPublishers': [{'publisherId': '631204',
'mlcPublisherNumber': None,
'publisherName': 'TARPO MUSIC PUB.',
'publisherIpiNumber': '00419823444',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_337535222',
'chainParentId': '',
'administrators': [],
'parentPublishers': []}]}],
'iswc': ''}
然后为了生成数据框,我使用了以下代码:
df = pd.json_normalize(result)
# df = pd.read_json(result)
print(df)
但是出错了
AttributeError: module 'pandas' has no attribute 'json_normalize'
我的主要目标是将其转换为 Excel 或 CSV 格式,以便正确阅读。
从结果字典开始
result = {'mlcSongCode': 'A6457V',
'primaryTitle': 'AIR FORCE ONES',
'membersSongId': '',
'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
'propertyId': None,
'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
'writers': [{'writerId': '1083561',
'writerLastName': 'SMITH',
'writerFirstName': 'PREMRO VONZELLAIRE',
'writerIPI': '00232478669',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535223',
'chainParentId': ''},
{'writerId': '1858916',
'writerLastName': 'GOODWIN',
'writerFirstName': 'MARLON',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535224',
'chainParentId': ''},
{'writerId': '1883205',
'writerLastName': 'HAYNES',
'writerFirstName': 'CORNELL',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535225',
'chainParentId': ''},
{'writerId': '4733138',
'writerLastName': 'LAVELLE',
'writerFirstName': 'CRUMP',
'writerIPI': '',
'writerRoleCode': 'ComposerLyricist',
'chainId': 'PSC_337535226',
'chainParentId': ''}],
'publishers': [{'publisherId': '910354',
'mlcPublisherNumber': None,
'publisherName': 'TENYOR MUSIC',
'publisherIpiNumber': '00263286262',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 16.67,
'chainId': 'PSA_311720187',
'chainParentId': 'PSC_311915511',
'administrators': [],
'parentPublishers': [{'publisherId': '377508',
'mlcPublisherNumber': None,
'publisherName': 'ALL MY PUBLISHING LLC',
'publisherIpiNumber': '',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_311915511',
'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
'administrators': [],
'parentPublishers': []}]},
{'publisherId': '716372',
'mlcPublisherNumber': None,
'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
'publisherIpiNumber': '00503659557',
'publisherRoleCode': 'SubPublisher',
'collectionShare': 50,
'chainId': 'PSA_365023093',
'chainParentId': 'PSC_337535222',
'administrators': [],
'parentPublishers': [{'publisherId': '631204',
'mlcPublisherNumber': None,
'publisherName': 'TARPO MUSIC PUB.',
'publisherIpiNumber': '00419823444',
'publisherRoleCode': 'OriginalPublisher',
'collectionShare': 0,
'chainId': 'PSC_337535222',
'chainParentId': '',
'administrators': [],
'parentPublishers': []}]}],
'iswc': ''}
将其加载到数据框中:
import pandas as pd
df = pd.json_normalize(result)
这给出了一个数据框,结果的每个键作为一列,键的值作为列值。在这种情况下,列是 mlcSongCode primaryTitle membersSongId artists propertyId akas writers publishers iswc
展开writers
列:
df = df.explode('writers').reset_index(drop=True)
这会将 writers
数组中的每个元素转换为一行,为您提供一个数据框,每个 'writer'
将 writers
JSON 标准化为平坦 table。这需要每个 'writer' 的 JSON 并将它的每个键扩展到一个列中。例如。它将为 'writerLastName'、'writerFirstName' 等
normalized = pd.json_normalize(df['writers'])
将规范化数据框加入原始数据框,并删除原始 'writers' 列:
df = df.join(normalized).drop(columns=['writers'])
然后根据需要重复其他 JSON 列