如何将这个复杂的嵌套字典转换为 Pandas

How do I convert this complex nested Dict into Pandas

我想将来自 API 的调用结果转换为数据框。 API 调用的结果是嵌套字典,但我无法生成数据框。

除了json_normalize,我还尝试了pd.DataFrame.from_dict。然而,直到现在都没有成功。我也试过压平字典,但是没有。

我使用了以下调用:

response = requests.request("GET", url, headers=headers, data=payload)
result = response.json()

输出为:

{'mlcSongCode': 'A6457V',
 'primaryTitle': 'AIR FORCE ONES',
 'membersSongId': '',
 'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
 'propertyId': None,
 'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
 'writers': [{'writerId': '1083561',
   'writerLastName': 'SMITH',
   'writerFirstName': 'PREMRO VONZELLAIRE',
   'writerIPI': '00232478669',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535223',
   'chainParentId': ''},
  {'writerId': '1858916',
   'writerLastName': 'GOODWIN',
   'writerFirstName': 'MARLON',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535224',
   'chainParentId': ''},
  {'writerId': '1883205',
   'writerLastName': 'HAYNES',
   'writerFirstName': 'CORNELL',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535225',
   'chainParentId': ''},
  {'writerId': '4733138',
   'writerLastName': 'LAVELLE',
   'writerFirstName': 'CRUMP',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535226',
   'chainParentId': ''}],
 'publishers': [{'publisherId': '910354',
   'mlcPublisherNumber': None,
   'publisherName': 'TENYOR MUSIC',
   'publisherIpiNumber': '00263286262',
   'publisherRoleCode': 'OriginalPublisher',
   'collectionShare': 16.67,
   'chainId': 'PSA_311720187',
   'chainParentId': 'PSC_311915511',
   'administrators': [],
   'parentPublishers': [{'publisherId': '377508',
     'mlcPublisherNumber': None,
     'publisherName': 'ALL MY PUBLISHING LLC',
     'publisherIpiNumber': '',
     'publisherRoleCode': 'OriginalPublisher',
     'collectionShare': 0,
     'chainId': 'PSC_311915511',
     'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
     'administrators': [],
     'parentPublishers': []}]},
  {'publisherId': '716372',
   'mlcPublisherNumber': None,
   'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
   'publisherIpiNumber': '00503659557',
   'publisherRoleCode': 'SubPublisher',
   'collectionShare': 50,
   'chainId': 'PSA_365023093',
   'chainParentId': 'PSC_337535222',
   'administrators': [],
   'parentPublishers': [{'publisherId': '631204',
     'mlcPublisherNumber': None,
     'publisherName': 'TARPO MUSIC PUB.',
     'publisherIpiNumber': '00419823444',
     'publisherRoleCode': 'OriginalPublisher',
     'collectionShare': 0,
     'chainId': 'PSC_337535222',
     'chainParentId': '',
     'administrators': [],
     'parentPublishers': []}]}],
 'iswc': ''}

然后为了生成数据框,我使用了以下代码:

df = pd.json_normalize(result)
# df = pd.read_json(result)
print(df)

但是出错了

AttributeError: module 'pandas' has no attribute 'json_normalize'

我的主要目标是将其转换为 Excel 或 CSV 格式,以便正确阅读。

从结果字典开始

result = {'mlcSongCode': 'A6457V',
 'primaryTitle': 'AIR FORCE ONES',
 'membersSongId': '',
 'artists': 'TRACK | NELLY, MURPHY LEE, ALI, KYJUAN, TRACK BOYZ',
 'propertyId': None,
 'akas': [{'akaId': '', 'akaTitle': '', 'akaTitleTypeCode': ''}],
 'writers': [{'writerId': '1083561',
   'writerLastName': 'SMITH',
   'writerFirstName': 'PREMRO VONZELLAIRE',
   'writerIPI': '00232478669',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535223',
   'chainParentId': ''},
  {'writerId': '1858916',
   'writerLastName': 'GOODWIN',
   'writerFirstName': 'MARLON',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535224',
   'chainParentId': ''},
  {'writerId': '1883205',
   'writerLastName': 'HAYNES',
   'writerFirstName': 'CORNELL',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535225',
   'chainParentId': ''},
  {'writerId': '4733138',
   'writerLastName': 'LAVELLE',
   'writerFirstName': 'CRUMP',
   'writerIPI': '',
   'writerRoleCode': 'ComposerLyricist',
   'chainId': 'PSC_337535226',
   'chainParentId': ''}],
 'publishers': [{'publisherId': '910354',
   'mlcPublisherNumber': None,
   'publisherName': 'TENYOR MUSIC',
   'publisherIpiNumber': '00263286262',
   'publisherRoleCode': 'OriginalPublisher',
   'collectionShare': 16.67,
   'chainId': 'PSA_311720187',
   'chainParentId': 'PSC_311915511',
   'administrators': [],
   'parentPublishers': [{'publisherId': '377508',
     'mlcPublisherNumber': None,
     'publisherName': 'ALL MY PUBLISHING LLC',
     'publisherIpiNumber': '',
     'publisherRoleCode': 'OriginalPublisher',
     'collectionShare': 0,
     'chainId': 'PSC_311915511',
     'chainParentId': 'PSC_337535223|PSC_337535224|PSC_337535225|PSC_337535226',
     'administrators': [],
     'parentPublishers': []}]},
  {'publisherId': '716372',
   'mlcPublisherNumber': None,
   'publisherName': 'KOBALT MUSIC PUB AMERICA INC',
   'publisherIpiNumber': '00503659557',
   'publisherRoleCode': 'SubPublisher',
   'collectionShare': 50,
   'chainId': 'PSA_365023093',
   'chainParentId': 'PSC_337535222',
   'administrators': [],
   'parentPublishers': [{'publisherId': '631204',
     'mlcPublisherNumber': None,
     'publisherName': 'TARPO MUSIC PUB.',
     'publisherIpiNumber': '00419823444',
     'publisherRoleCode': 'OriginalPublisher',
     'collectionShare': 0,
     'chainId': 'PSC_337535222',
     'chainParentId': '',
     'administrators': [],
     'parentPublishers': []}]}],
 'iswc': ''}

将其加载到数据框中:

import pandas as pd
df = pd.json_normalize(result)

这给出了一个数据框,结果的每个键作为一列,键的值作为列值。在这种情况下,列是 mlcSongCode primaryTitle membersSongId artists propertyId akas writers publishers iswc

展开writers列:

df = df.explode('writers').reset_index(drop=True)

这会将 writers 数组中的每个元素转换为一行,为您提供一个数据框,每个 'writer'

有一行

writers JSON 标准化为平坦 table。这需要每个 'writer' 的 JSON 并将它的每个键扩展到一个列中。例如。它将为 'writerLastName'、'writerFirstName' 等

生成一列
normalized = pd.json_normalize(df['writers'])

将规范化数据框加入原始数据框,并删除原始 'writers' 列:

df = df.join(normalized).drop(columns=['writers'])

然后根据需要重复其他 JSON 列