将 API 数据(json)处理成单个数据框(字典列表列表)?

Processing API data (json) into a singular data frame (list of list of dictionaries)?

所以这是我之前 post 的延续,除了现在我有 API 数据可以使用。我正在尝试将键 Type 和 Email 作为数据框中的列来得出最终数字。我的代码:

jsp_full=[]
for p in payloads:
    payload = {"payload": {"segmentId":p}}
    r = requests.post(url,headers = header, json = payload)
    #print(r, r.reason)
    time.sleep(r.elapsed.total_seconds())

    json_data = r.json() if r and r.status_code == 200 else None

    json_keys = json_data['payload']['supporters']

    json_package = []
    jsp_full.append(json_package)
    for row in json_keys:
        SID = row['supporterId']
        Handle = row['contacts']
        a_key = 'value'
        list_values = [a_list[a_key] for a_list in Handle]
        string = str(list_values).split(",")
        data = {
            'SupporterID' : SID,
            'Email' : strip_characters(string[-1]),
            'Type' : labels(p)
        }
        json_package.append(data)



    t2 = round(time.perf_counter(),2)

    b_key = "Email"
    e = len([b_list[b_key] for b_list in json_package])
    t = str(labels(p))

    #print(json_package)
    print(f'There are {e} emails in the {t} segment')
    print(f'Finished in {t2 - t1} seconds')


    excel = pd.DataFrame(json_package)
    excel.to_excel(r'C:\Users\am\Desktop\email parsing\{0} segment {1}.xlsx'.format(t, str(today)), sheet_name=t)

这部分工作得很好。 API 中的每个有效载荷代表不同的人群,因此我将它们分成不同的文件。但是,我现在需要将所有记录合并到一个数据框中,因此我将其附加到 jsp_full。这是字典列表的列表。

一旦我有了它,我就会 运行 我的代码的平衡是这样的:

S= pd.DataFrame(jsp_full[0], index = {0})
Advocacy_Supporters = S.sort_values("Type").groupby("Type", as_index=False)["Email"].first()
print(Advocacy_Supporters['Email'].count())

print("The number of Unique Advocacy Supporters is :")
Advocacy_Supporters_Group = Advocacy_Supporters.groupby("Type")["Email"].nunique()
print(Advocacy_Supporters_Group)

一些示例数据:

[{'SupporterID': '565f6a2f-c7fd-4f1b-bac2-e33976ef4306', 'Email': 'somebody@somewhere.edu', 'Type': 'd_Student Ambassadors'}, {'SupporterID': '7508dc12-7647-4e95-a8b8-bcb067861faf', 'Email': 'someoneelse@email.somewhere.edu', 'Type': 'd_Student Ambassadors'},...`

我想要的输出是一个数据框,如下所示:

SupporterID                           Email                     Type
565f6a2f-c7fd-4f1b-bac2-e33976ef4306  somebody@somewhere.edu    d_Student Ambassadors
7508dc12-7647-4e95-a8b8-bcb067861faf  someoneelse@email.somewhere.edu d_Student Ambassadors

非常感谢任何帮助!!

因此,因为此代码为每个段创建了一个 excel 文件,所以我所做的一切都是通过 for 循环在 excel 中读回的,如下所示:

filesnames = ['e_S Donors', 'b_Contributors', 'c_Activists', 'd_Student Ambassadors', 'a_Volunteers', 'f_Offline Action Takers']

S= pd.DataFrame()
for i in filesnames:
    data = pd.read_excel(r'C:\Users\am\Desktop\email parsing\{0} segment {1}.xlsx'.format(i, str(today)),sheet_name= i, engine = 'openpyxl')
    S= S.append(data)

这成功了,因为它是我想要的格式。