使用python pandas如何进行以下分析计算

Using python pandas how to do some following analysis for calculation

我有一个数据集,其中包含 50000 名在某些村庄种植农作物的农民。我必须弄清楚同一调查编号中有多少农民的土地以及他的作物面积有多少[附输出图像]

这是我的虚拟数据集

df
Out[5]: 
       Name    Village  Survey_no  Land_Area
0  Farmer_1  Village_1         26       0.33
1  Farmer_1  Village_1         26       0.40
2  Farmer_2  Village_1         26       0.30
3  Farmer_2  Village_1         26       0.40
4  Farmer_2  Village_1         26       0.50
5  Farmer_3  Village_1         26       0.52
6  Farmer_3  Village_1         26       0.40
7  Farmer_4  Village_1        151       0.23
8  Farmer_5  Village_1        151       0.25
9  Farmer_5  Village_1        151       0.10

这里是需要的实际输出

这是我目前的情况:

df = (df.set_index(['Village','Survey_no', df.groupby(['Village','Survey_no']).cumcount().add(1)]).unstack().sort_index(axis=1, level=1))
df.columns = ['{}-{}'.format(x, y) for x, y in df.columns]

df = df.reset_index()


df

Village  Survey_no  Land_Area-1    ...       Name-6  Land_Area-7    Name-7
0  Village_1         26         0.33    ...     Farmer_3          0.4  Farmer_3
1  Village_1        151         0.23    ...          NaN          NaN       NaN

输出不正确,因为我没有得到实际农民明智的同一块土地的总面积和同一块土地上的农民数量。

经验和实力仅此而已。如何加入bbb aaa,我想到了过于复杂的解决方案。我不喜欢。

bbb = df.groupby(['Name'])['Land_Area'].aggregate(['sum'])
aaa = df.groupby(['Village', 'Survey_no']).aggregate({'Land_Area': 'sum', 'Name': 'nunique'}).reset_index()
aaa = aaa.rename(columns={"Name": "No.of Farmers"})

输出bbb

           sum
Name          
Farmer_1  0.73
Farmer_2  1.20
Farmer_3  0.92
Farmer_4  0.23
Farmer_5  0.35

输出aaa

     Village  Survey_no  Land_Area  No.of Farmers
0  Village_1         26       2.85              3
1  Village_1        151       0.58              2

更新:

dfs= df.groupby(['Name', 'Village', 'Survey_no']).agg('sum')
dfs = dfs.reset_index(level=0).set_index([dfs.groupby(['Village', 'Survey_no']).cumcount() + 1], append=True)\
         .unstack().sort_index(level=1, axis=1)
dfs.columns = [f'{i}_{j}' for i, j in dfs.columns]
dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs

输出:

                     Land_Area_1    Name_1  Land_Area_2    Name_2  Land_Area_3    Name_3  Total_Land_Area
Village   Survey_no                                                                                      
Village_1 26                0.73  Farmer_1         1.20  Farmer_2         0.92  Farmer_3             2.85
          151               0.23  Farmer_4         0.35  Farmer_5          NaN       NaN             0.58

试试这个:

cnt = df.groupby(['Village', 'Survey_no']).cumcount()+1
dfs= df.groupby(['Village', 'Survey_no', cnt]).agg({'Name':'first',
                                              'Land_Area':'sum'})\
  .unstack()\
  .sort_index(level=1, axis=1)

dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs.columns = [f'{i}_{j}' if j else f'{i}' for i, j in dfs.columns]
dfs

输出:

                     Land_Area_1    Name_1  ...    Name_7 Total_Land_Area
Village   Survey_no                         ...                          
Village_1 26                0.33  Farmer_1  ...  Farmer_3            2.85
          151               0.23  Farmer_4  ...       NaN            0.58

[2 rows x 15 columns]