使用python pandas如何进行以下分析计算
Using python pandas how to do some following analysis for calculation
我有一个数据集,其中包含 50000 名在某些村庄种植农作物的农民。我必须弄清楚同一调查编号中有多少农民的土地以及他的作物面积有多少[附输出图像]
这是我的虚拟数据集
df
Out[5]:
Name Village Survey_no Land_Area
0 Farmer_1 Village_1 26 0.33
1 Farmer_1 Village_1 26 0.40
2 Farmer_2 Village_1 26 0.30
3 Farmer_2 Village_1 26 0.40
4 Farmer_2 Village_1 26 0.50
5 Farmer_3 Village_1 26 0.52
6 Farmer_3 Village_1 26 0.40
7 Farmer_4 Village_1 151 0.23
8 Farmer_5 Village_1 151 0.25
9 Farmer_5 Village_1 151 0.10
这里是需要的实际输出
这是我目前的情况:
df = (df.set_index(['Village','Survey_no', df.groupby(['Village','Survey_no']).cumcount().add(1)]).unstack().sort_index(axis=1, level=1))
df.columns = ['{}-{}'.format(x, y) for x, y in df.columns]
df = df.reset_index()
df
Village Survey_no Land_Area-1 ... Name-6 Land_Area-7 Name-7
0 Village_1 26 0.33 ... Farmer_3 0.4 Farmer_3
1 Village_1 151 0.23 ... NaN NaN NaN
输出不正确,因为我没有得到实际农民明智的同一块土地的总面积和同一块土地上的农民数量。
经验和实力仅此而已。如何加入bbb
aaa,我想到了过于复杂的解决方案。我不喜欢。
bbb = df.groupby(['Name'])['Land_Area'].aggregate(['sum'])
aaa = df.groupby(['Village', 'Survey_no']).aggregate({'Land_Area': 'sum', 'Name': 'nunique'}).reset_index()
aaa = aaa.rename(columns={"Name": "No.of Farmers"})
输出bbb
sum
Name
Farmer_1 0.73
Farmer_2 1.20
Farmer_3 0.92
Farmer_4 0.23
Farmer_5 0.35
输出aaa
Village Survey_no Land_Area No.of Farmers
0 Village_1 26 2.85 3
1 Village_1 151 0.58 2
更新:
dfs= df.groupby(['Name', 'Village', 'Survey_no']).agg('sum')
dfs = dfs.reset_index(level=0).set_index([dfs.groupby(['Village', 'Survey_no']).cumcount() + 1], append=True)\
.unstack().sort_index(level=1, axis=1)
dfs.columns = [f'{i}_{j}' for i, j in dfs.columns]
dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs
输出:
Land_Area_1 Name_1 Land_Area_2 Name_2 Land_Area_3 Name_3 Total_Land_Area
Village Survey_no
Village_1 26 0.73 Farmer_1 1.20 Farmer_2 0.92 Farmer_3 2.85
151 0.23 Farmer_4 0.35 Farmer_5 NaN NaN 0.58
试试这个:
cnt = df.groupby(['Village', 'Survey_no']).cumcount()+1
dfs= df.groupby(['Village', 'Survey_no', cnt]).agg({'Name':'first',
'Land_Area':'sum'})\
.unstack()\
.sort_index(level=1, axis=1)
dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs.columns = [f'{i}_{j}' if j else f'{i}' for i, j in dfs.columns]
dfs
输出:
Land_Area_1 Name_1 ... Name_7 Total_Land_Area
Village Survey_no ...
Village_1 26 0.33 Farmer_1 ... Farmer_3 2.85
151 0.23 Farmer_4 ... NaN 0.58
[2 rows x 15 columns]
我有一个数据集,其中包含 50000 名在某些村庄种植农作物的农民。我必须弄清楚同一调查编号中有多少农民的土地以及他的作物面积有多少[附输出图像]
这是我的虚拟数据集
df
Out[5]:
Name Village Survey_no Land_Area
0 Farmer_1 Village_1 26 0.33
1 Farmer_1 Village_1 26 0.40
2 Farmer_2 Village_1 26 0.30
3 Farmer_2 Village_1 26 0.40
4 Farmer_2 Village_1 26 0.50
5 Farmer_3 Village_1 26 0.52
6 Farmer_3 Village_1 26 0.40
7 Farmer_4 Village_1 151 0.23
8 Farmer_5 Village_1 151 0.25
9 Farmer_5 Village_1 151 0.10
这里是需要的实际输出
这是我目前的情况:
df = (df.set_index(['Village','Survey_no', df.groupby(['Village','Survey_no']).cumcount().add(1)]).unstack().sort_index(axis=1, level=1))
df.columns = ['{}-{}'.format(x, y) for x, y in df.columns]
df = df.reset_index()
df
Village Survey_no Land_Area-1 ... Name-6 Land_Area-7 Name-7
0 Village_1 26 0.33 ... Farmer_3 0.4 Farmer_3
1 Village_1 151 0.23 ... NaN NaN NaN
输出不正确,因为我没有得到实际农民明智的同一块土地的总面积和同一块土地上的农民数量。
经验和实力仅此而已。如何加入bbb aaa,我想到了过于复杂的解决方案。我不喜欢。
bbb = df.groupby(['Name'])['Land_Area'].aggregate(['sum'])
aaa = df.groupby(['Village', 'Survey_no']).aggregate({'Land_Area': 'sum', 'Name': 'nunique'}).reset_index()
aaa = aaa.rename(columns={"Name": "No.of Farmers"})
输出bbb
sum
Name
Farmer_1 0.73
Farmer_2 1.20
Farmer_3 0.92
Farmer_4 0.23
Farmer_5 0.35
输出aaa
Village Survey_no Land_Area No.of Farmers
0 Village_1 26 2.85 3
1 Village_1 151 0.58 2
更新:
dfs= df.groupby(['Name', 'Village', 'Survey_no']).agg('sum')
dfs = dfs.reset_index(level=0).set_index([dfs.groupby(['Village', 'Survey_no']).cumcount() + 1], append=True)\
.unstack().sort_index(level=1, axis=1)
dfs.columns = [f'{i}_{j}' for i, j in dfs.columns]
dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs
输出:
Land_Area_1 Name_1 Land_Area_2 Name_2 Land_Area_3 Name_3 Total_Land_Area
Village Survey_no
Village_1 26 0.73 Farmer_1 1.20 Farmer_2 0.92 Farmer_3 2.85
151 0.23 Farmer_4 0.35 Farmer_5 NaN NaN 0.58
试试这个:
cnt = df.groupby(['Village', 'Survey_no']).cumcount()+1
dfs= df.groupby(['Village', 'Survey_no', cnt]).agg({'Name':'first',
'Land_Area':'sum'})\
.unstack()\
.sort_index(level=1, axis=1)
dfs = dfs.assign(Total_Land_Area=dfs.filter(like='Land_Area').sum(axis=1))
dfs.columns = [f'{i}_{j}' if j else f'{i}' for i, j in dfs.columns]
dfs
输出:
Land_Area_1 Name_1 ... Name_7 Total_Land_Area
Village Survey_no ...
Village_1 26 0.33 Farmer_1 ... Farmer_3 2.85
151 0.23 Farmer_4 ... NaN 0.58
[2 rows x 15 columns]