如何通过 sql 或 python 合并行并放入数据框中的单行
How to combine rows and put into single row in dataframe by sql or python
我想根据与其他列的关系聚合特定列中的行,并创建包含 json 格式的聚合数据的特定列。
这就是例子。
原始数据table
Child Name Child Age Father Name Father Age
Peter 5 Richard 40
James 15 Doug 45
Liz 2 Doug 45
Paul 6 Richard 40
Shirly 11 Charles 33
Eva 9 Chris 29
转换后的数据 table 将是
Father Name Father Age Children
Richard 40 {"Peter":"5", "Paul":"6"}
Doug 45 {"James":"15","Liz":"2"}
Charles 33 {"Shirly" : "11"}
Chris 29 {"Eva" : "9"}
或
Father Name Father Age Children Name Children Age
Richard 40 {"Peter", "Paul"} {"5","6"}
Doug 45 {"James", "Liz"} {"15","2"}
Charles 33 {"Shirly"} {"11"}
Chris 29 {"Eva"} {"9"}
我的密码是
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
print df
g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1
输出将是
Father Name Children Name
0 Charles Shirly
1 Chris Eva
2 Doug James, Liz
3 Richard Peter, Paul
我不知道如何在列中添加 "Father Age" 和 "Children Age"。
如何以最有效的方式在数据框中转换它?
我想避免通过 python 循环,因为它需要很长时间才能处理。
谢谢,
快速肮脏的低效 hack,但它避免了 for 循环。希望有更好的解决方案;我假设可以简化多个 df 副本和多个合并。
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()
df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)
输出:
Father Name Child Name Father Age Child Age
0 Charles [Shirly] 33 [11]
1 Chris [Eva] 29 [9]
2 Doug [James, Liz] 45 [15, 2]
3 Richard [Peter, Paul] 40 [5, 6]
我想根据与其他列的关系聚合特定列中的行,并创建包含 json 格式的聚合数据的特定列。
这就是例子。
原始数据table
Child Name Child Age Father Name Father Age
Peter 5 Richard 40
James 15 Doug 45
Liz 2 Doug 45
Paul 6 Richard 40
Shirly 11 Charles 33
Eva 9 Chris 29
转换后的数据 table 将是
Father Name Father Age Children
Richard 40 {"Peter":"5", "Paul":"6"}
Doug 45 {"James":"15","Liz":"2"}
Charles 33 {"Shirly" : "11"}
Chris 29 {"Eva" : "9"}
或
Father Name Father Age Children Name Children Age
Richard 40 {"Peter", "Paul"} {"5","6"}
Doug 45 {"James", "Liz"} {"15","2"}
Charles 33 {"Shirly"} {"11"}
Chris 29 {"Eva"} {"9"}
我的密码是
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
print df
g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1
输出将是
Father Name Children Name
0 Charles Shirly
1 Chris Eva
2 Doug James, Liz
3 Richard Peter, Paul
我不知道如何在列中添加 "Father Age" 和 "Children Age"。 如何以最有效的方式在数据框中转换它? 我想避免通过 python 循环,因为它需要很长时间才能处理。
谢谢,
快速肮脏的低效 hack,但它避免了 for 循环。希望有更好的解决方案;我假设可以简化多个 df 副本和多个合并。
import pandas as pd
df = pd.DataFrame({
"Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
"Child Age" : ["5","15","2","6","11","9"],
"Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
"Father Age" : ["40","45","45","40","33","29"] })
g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()
df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)
输出:
Father Name Child Name Father Age Child Age
0 Charles [Shirly] 33 [11]
1 Chris [Eva] 29 [9]
2 Doug [James, Liz] 45 [15, 2]
3 Richard [Peter, Paul] 40 [5, 6]