如何将一系列字典列表转换为数据框?
How can I transform a series of lists of dictionaries into a dataframe?
我有以下数据框。流派列是多个词典的列表。
index. title genres
0 Avatar [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
1 Pirates of the Caribbean: At World's End [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
2 Spectre [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
3 The Dark Knight Rises [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
4 John Carter [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]
我想要一个数据框如下:
Title Name
Avatar Action
Avatar Adventure
Avatar Fantasy
Avatar Science Fiction
Pirates.. Adventure
Pirates.. Fantasy
...
我希望问题很清楚。这是我第一次发布问题。
谢谢,
title = ["Avatar", "Pirates of the Caribbean: At World's End", "Spectre", "The Dark Knight Rises", "John Carter" ]
genres = [[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}],
[{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}],
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}],
[{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}],
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]]
df = pd.DataFrame({"title": title,
"genres": genres})
爆词典系列:
genres_list = df["genres"].apply(lambda x: [y["name"] for y in x ]).explode()
genres_list
0 Action
0 Adventure
0 Fantasy
0 Science Fiction
1 Adventure
1 Fantasy
1 Action
2 Action
2 Adventure
2 Crime
3 Action
3 Crime
3 Drama
3 Thriller
4 Action
4 Adventure
4 Science Fiction
Name: genres, dtype: object
扩展标题:
df["title"]
中的每个元素重复 n_i
次,其中 n_i
是相应字典的长度。参见documentation。
title_rep = df["title"].repeat(df["genres"].apply(lambda x: len(x)))
title_rep
0 Avatar
0 Avatar
0 Avatar
0 Avatar
1 Pirates of the Caribbean: At World's End
1 Pirates of the Caribbean: At World's End
1 Pirates of the Caribbean: At World's End
2 Spectre
2 Spectre
2 Spectre
3 The Dark Knight Rises
3 The Dark Knight Rises
3 The Dark Knight Rises
3 The Dark Knight Rises
4 John Carter
4 John Carter
4 John Carter
Name: title, dtype: object
合并:
pd.DataFrame({"title": title_rep,
"genres": genres_list})
Returns:
title genres
0 Avatar Action
0 Avatar Adventure
0 Avatar Fantasy
0 Avatar Science Fiction
1 Pirates of the Caribbean: At World's End Adventure
1 Pirates of the Caribbean: At World's End Fantasy
1 Pirates of the Caribbean: At World's End Action
2 Spectre Action
2 Spectre Adventure
2 Spectre Crime
3 The Dark Knight Rises Action
3 The Dark Knight Rises Crime
3 The Dark Knight Rises Drama
3 The Dark Knight Rises Thriller
4 John Carter Action
4 John Carter Adventure
4 John Carter Science Fiction
假设我们有一个 df:
df
title genres
0 Avatar [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
1 Pirates of the Caribbean: At World's End [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
2 Spectre [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
3 The Dark Knight Rises [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
4 John Carter [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]
那么我们可以这样做:
df["genres"] = df["genres"].apply(lambda row: [genre["name"] for genre in row])
df.explode("genres")
title genres
0 Avatar Action
0 Avatar Adventure
0 Avatar Fantasy
0 Avatar Science Fiction
1 Pirates of the Caribbean: At World's End Adventure
1 Pirates of the Caribbean: At World's End Fantasy
1 Pirates of the Caribbean: At World's End Action
2 Spectre Action
2 Spectre Adventure
2 Spectre Crime
3 The Dark Knight Rises Action
3 The Dark Knight Rises Crime
3 The Dark Knight Rises Drama
3 The Dark Knight Rises Thriller
4 John Carter Action
4 John Carter Adventure
4 John Carter Science Fiction
我会这样做:
import pandas as pd
df = pd.DataFrame({"title":["Avatar","Spectre"],"genres":[
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}],
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
]})
print(df)
title genres
0 Avatar [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
1 Spectre [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
仅获取 "genres" 列中的姓名
df["genres"] = df["genres"].apply(lambda x:[y.get("name") for y in x])
创建一个只有名字的新数据框:
df1 = pd.DataFrame(df["genres"].values.tolist())
df1.columns = ["name_{}".format(x) for x in range(len(df1.columns))]
两者结合:
df = pd.concat([df[["title"]],df1],axis=1)
融化:
df.melt(id_vars="title",value_vars=df.columns[1:],value_name="name")[["title","name"]].dropna().set_index("title").sort_index()
name
title
Avatar Action
Avatar Adventure
Avatar Fantasy
Avatar Science Fiction
Spectre Action
Spectre Adventure
Spectre Crime
import pandas as pd
import ast
df = "dataframe"
df_list = []
Iterate through each row and gets values of Title and genres columns
for index, row in df.iterrows():
title = row['title']
gn = row['genres']
genres = ast.literal_eval(gn)
for i in range(0, len(genres)):
r_list = []
r_list.append(title)
r_list.append(genres[i]['name'])
df_list.append(r_list)
out_df = pd.DataFrame(df_list,columns=['Title','Name'])
print(out_df.head)
if values of column genres are of type string, then we need to convert it in to list, for that we use "ast.literal_eval()"
我有以下数据框。流派列是多个词典的列表。
index. title genres
0 Avatar [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
1 Pirates of the Caribbean: At World's End [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
2 Spectre [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
3 The Dark Knight Rises [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
4 John Carter [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]
我想要一个数据框如下:
Title Name
Avatar Action
Avatar Adventure
Avatar Fantasy
Avatar Science Fiction
Pirates.. Adventure
Pirates.. Fantasy
...
我希望问题很清楚。这是我第一次发布问题。 谢谢,
title = ["Avatar", "Pirates of the Caribbean: At World's End", "Spectre", "The Dark Knight Rises", "John Carter" ]
genres = [[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}],
[{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}],
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}],
[{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}],
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]]
df = pd.DataFrame({"title": title,
"genres": genres})
爆词典系列:
genres_list = df["genres"].apply(lambda x: [y["name"] for y in x ]).explode()
genres_list
0 Action
0 Adventure
0 Fantasy
0 Science Fiction
1 Adventure
1 Fantasy
1 Action
2 Action
2 Adventure
2 Crime
3 Action
3 Crime
3 Drama
3 Thriller
4 Action
4 Adventure
4 Science Fiction
Name: genres, dtype: object
扩展标题:
df["title"]
中的每个元素重复 n_i
次,其中 n_i
是相应字典的长度。参见documentation。
title_rep = df["title"].repeat(df["genres"].apply(lambda x: len(x)))
title_rep
0 Avatar
0 Avatar
0 Avatar
0 Avatar
1 Pirates of the Caribbean: At World's End
1 Pirates of the Caribbean: At World's End
1 Pirates of the Caribbean: At World's End
2 Spectre
2 Spectre
2 Spectre
3 The Dark Knight Rises
3 The Dark Knight Rises
3 The Dark Knight Rises
3 The Dark Knight Rises
4 John Carter
4 John Carter
4 John Carter
Name: title, dtype: object
合并:
pd.DataFrame({"title": title_rep,
"genres": genres_list})
Returns:
title genres
0 Avatar Action
0 Avatar Adventure
0 Avatar Fantasy
0 Avatar Science Fiction
1 Pirates of the Caribbean: At World's End Adventure
1 Pirates of the Caribbean: At World's End Fantasy
1 Pirates of the Caribbean: At World's End Action
2 Spectre Action
2 Spectre Adventure
2 Spectre Crime
3 The Dark Knight Rises Action
3 The Dark Knight Rises Crime
3 The Dark Knight Rises Drama
3 The Dark Knight Rises Thriller
4 John Carter Action
4 John Carter Adventure
4 John Carter Science Fiction
假设我们有一个 df:
df
title genres
0 Avatar [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]
1 Pirates of the Caribbean: At World's End [{"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 28, "name": "Action"}]
2 Spectre [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
3 The Dark Knight Rises [{"id": 28, "name": "Action"}, {"id": 80, "name": "Crime"}, {"id": 18, "name": "Drama"}, {"id": 53, "name": "Thriller"}]
4 John Carter [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 878, "name": "Science Fiction"}]
那么我们可以这样做:
df["genres"] = df["genres"].apply(lambda row: [genre["name"] for genre in row])
df.explode("genres")
title genres
0 Avatar Action
0 Avatar Adventure
0 Avatar Fantasy
0 Avatar Science Fiction
1 Pirates of the Caribbean: At World's End Adventure
1 Pirates of the Caribbean: At World's End Fantasy
1 Pirates of the Caribbean: At World's End Action
2 Spectre Action
2 Spectre Adventure
2 Spectre Crime
3 The Dark Knight Rises Action
3 The Dark Knight Rises Crime
3 The Dark Knight Rises Drama
3 The Dark Knight Rises Thriller
4 John Carter Action
4 John Carter Adventure
4 John Carter Science Fiction
我会这样做:
import pandas as pd
df = pd.DataFrame({"title":["Avatar","Spectre"],"genres":[
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}],
[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 80, "name": "Crime"}]
]})
print(df)
title genres
0 Avatar [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
1 Spectre [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
仅获取 "genres" 列中的姓名
df["genres"] = df["genres"].apply(lambda x:[y.get("name") for y in x])
创建一个只有名字的新数据框:
df1 = pd.DataFrame(df["genres"].values.tolist())
df1.columns = ["name_{}".format(x) for x in range(len(df1.columns))]
两者结合:
df = pd.concat([df[["title"]],df1],axis=1)
融化:
df.melt(id_vars="title",value_vars=df.columns[1:],value_name="name")[["title","name"]].dropna().set_index("title").sort_index()
name
title
Avatar Action
Avatar Adventure
Avatar Fantasy
Avatar Science Fiction
Spectre Action
Spectre Adventure
Spectre Crime
import pandas as pd
import ast
df = "dataframe"
df_list = []
Iterate through each row and gets values of Title and genres columns
for index, row in df.iterrows():
title = row['title']
gn = row['genres']
genres = ast.literal_eval(gn)
for i in range(0, len(genres)):
r_list = []
r_list.append(title)
r_list.append(genres[i]['name'])
df_list.append(r_list)
out_df = pd.DataFrame(df_list,columns=['Title','Name'])
print(out_df.head)
if values of column genres are of type string, then we need to convert it in to list, for that we use "ast.literal_eval()"