如何将一系列字典列表转换为数据框?

How can I transform a series of lists of dictionaries into a dataframe?

我有以下数据框。流派列是多个词典的列表。

index. title    genres
0      Avatar                                       [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
1      Pirates of the Caribbean: At World's End     [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
2      Spectre                                      [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
3      The Dark Knight Rises                        [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
4      John Carter                                  [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]

我想要一个数据框如下:

     Title   Name
     Avatar  Action
     Avatar  Adventure
     Avatar  Fantasy
     Avatar  Science Fiction
     Pirates.. Adventure
     Pirates.. Fantasy
     ...

我希望问题很清楚。这是我第一次发布问题。 谢谢,

title = ["Avatar", "Pirates of the Caribbean: At World's End", "Spectre", "The Dark Knight Rises", "John Carter" ]
genres = [[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}],
          [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}],
          [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}],
          [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}],
          [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]]
df = pd.DataFrame({"title": title,
                   "genres": genres})

爆词典系列:

genres_list = df["genres"].apply(lambda x: [y["name"] for y in x ]).explode()
genres_list

0             Action
0          Adventure
0            Fantasy
0    Science Fiction
1          Adventure
1            Fantasy
1             Action
2             Action
2          Adventure
2              Crime
3             Action
3              Crime
3              Drama
3           Thriller
4             Action
4          Adventure
4    Science Fiction
Name: genres, dtype: object

扩展标题:

df["title"] 中的每个元素重复 n_i 次,其中 n_i 是相应字典的长度。参见documentation

title_rep = df["title"].repeat(df["genres"].apply(lambda x: len(x)))
title_rep

0                                      Avatar
0                                      Avatar
0                                      Avatar
0                                      Avatar
1    Pirates of the Caribbean: At World's End
1    Pirates of the Caribbean: At World's End
1    Pirates of the Caribbean: At World's End
2                                     Spectre
2                                     Spectre
2                                     Spectre
3                       The Dark Knight Rises
3                       The Dark Knight Rises
3                       The Dark Knight Rises
3                       The Dark Knight Rises
4                                 John Carter
4                                 John Carter
4                                 John Carter
Name: title, dtype: object

合并:

pd.DataFrame({"title": title_rep,
              "genres": genres_list})

Returns:

            title   genres
0   Avatar  Action
0   Avatar  Adventure
0   Avatar  Fantasy
0   Avatar  Science Fiction
1   Pirates of the Caribbean: At World's End    Adventure
1   Pirates of the Caribbean: At World's End    Fantasy
1   Pirates of the Caribbean: At World's End    Action
2   Spectre Action
2   Spectre Adventure
2   Spectre Crime
3   The Dark Knight Rises   Action
3   The Dark Knight Rises   Crime
3   The Dark Knight Rises   Drama
3   The Dark Knight Rises   Thriller
4   John Carter Action
4   John Carter Adventure
4   John Carter Science Fiction

假设我们有一个 df:

df
    title   genres
0   Avatar  [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
1   Pirates of the Caribbean: At World's End    [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
2   Spectre [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
3   The Dark Knight Rises   [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
4   John Carter [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]

那么我们可以这样做:

df["genres"] = df["genres"].apply(lambda row: [genre["name"] for genre in row])
df.explode("genres")
    title   genres
0   Avatar  Action
0   Avatar  Adventure
0   Avatar  Fantasy
0   Avatar  Science Fiction
1   Pirates of the Caribbean: At World's End    Adventure
1   Pirates of the Caribbean: At World's End    Fantasy
1   Pirates of the Caribbean: At World's End    Action
2   Spectre Action
2   Spectre Adventure
2   Spectre Crime
3   The Dark Knight Rises   Action
3   The Dark Knight Rises   Crime
3   The Dark Knight Rises   Drama
3   The Dark Knight Rises   Thriller
4   John Carter Action
4   John Carter Adventure
4   John Carter Science Fiction

我会这样做:

import pandas as pd

df = pd.DataFrame({"title":["Avatar","Spectre"],"genres":[
                    [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}],
                    [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
                    ]})

print(df)

     title                                             genres
0   Avatar  [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
1  Spectre  [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...

仅获取 "genres" 列中的姓名

df["genres"] = df["genres"].apply(lambda x:[y.get("name") for y in x])

创建一个只有名字的新数据框:

df1 = pd.DataFrame(df["genres"].values.tolist())
df1.columns = ["name_{}".format(x) for x in range(len(df1.columns))]

两者结合:

df = pd.concat([df[["title"]],df1],axis=1)

融化:

df.melt(id_vars="title",value_vars=df.columns[1:],value_name="name")[["title","name"]].dropna().set_index("title").sort_index()



                 name
title
Avatar            Action
Avatar         Adventure
Avatar           Fantasy
Avatar   Science Fiction
Spectre           Action
Spectre        Adventure
Spectre            Crime
import pandas as pd
import ast

df = "dataframe"
df_list = []

Iterate through each row and gets values of Title and genres columns

for index, row in df.iterrows():
    title = row['title']
    gn = row['genres']
    genres = ast.literal_eval(gn)

    for i in range(0, len(genres)):
        r_list = []
        r_list.append(title)
        r_list.append(genres[i]['name'])
        df_list.append(r_list)

out_df = pd.DataFrame(df_list,columns=['Title','Name'])
print(out_df.head)

if values of column genres are of type string, then we need to convert it in to list, for that we use "ast.literal_eval()"