如何通过 sql 或 python 合并行并放入数据框中的单行

How to combine rows and put into single row in dataframe by sql or python

我想根据与其他列的关系聚合特定列中的行,并创建包含 json 格式的聚合数据的特定列。

这就是例子。

原始数据table

Child Name     Child Age    Father Name    Father Age
     Peter             5        Richard            40
     James            15           Doug            45
       Liz             2           Doug            45
      Paul             6        Richard            40
    Shirly            11        Charles            33
       Eva             9          Chris            29

转换后的数据 table 将是

Father Name    Father Age     Children 
    Richard            40     {"Peter":"5", "Paul":"6"}
       Doug            45     {"James":"15","Liz":"2"}
    Charles            33     {"Shirly" : "11"}
      Chris            29     {"Eva" : "9"}

Father Name    Father Age     Children Name       Children Age
    Richard            40     {"Peter", "Paul"}      {"5","6"}
       Doug            45     {"James", "Liz"}      {"15","2"}
    Charles            33     {"Shirly"}                {"11"}
      Chris            29     {"Eva"}                    {"9"}

我的密码是

import pandas as pd
df = pd.DataFrame({
    "Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
    "Child Age" : ["5","15","2","6","11","9"],
    "Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
    "Father Age" : ["40","45","45","40","33","29"] })

 print df

g1 = df.groupby(["Father Name"])["Child Name"].apply(", ".join).reset_index()
g1.columns = ['Father Name','Children Name']
print g1

输出将是

  Father Name   Children Name
0     Charles          Shirly
1       Chris             Eva
2        Doug      James, Liz
3     Richard     Peter, Paul

我不知道如何在列中添加 "Father Age" 和 "Children Age"。 如何以最有效的方式在数据框中转换它? 我想避免通过 python 循环,因为它需要很长时间才能处理。

谢谢,

快速肮脏的低效 hack,但它避免了 for 循环。希望有更好的解决方案;我假设可以简化多个 df 副本和多个合并。

import pandas as pd
df = pd.DataFrame({
    "Child Name" : ["Peter","James","Liz","Paul","Shirly","Eva"],
    "Child Age" : ["5","15","2","6","11","9"],
    "Father Name" : ["Richard","Doug","Doug","Richard","Charles","Chris"],
    "Father Age" : ["40","45","45","40","33","29"] })

g2 = df.groupby(['Father Name'])["Child Name"].apply(list).reset_index()
g3 = df.groupby(['Father Name'])["Child Age"].apply(list).reset_index()
g4 = df[["Father Name", "Father Age"]].drop_duplicates()

df2 = g2.merge(g4)
df2 = df2.merge(g3)
print(df2)

输出:

  Father Name     Child Name Father Age Child Age
0     Charles       [Shirly]         33      [11]
1       Chris          [Eva]         29       [9]
2        Doug   [James, Liz]         45   [15, 2]
3     Richard  [Peter, Paul]         40    [5, 6]