汇总 pandas 中列的观察结果
Summing observations from column in pandas
假设我有一个大 Dataframe DS_df,其中包含列名 year、dealamount 和 CCS。对于从 1985 年到 2020 年的每一年,我都需要一个单独的熊猫系列,即 sum_2019。我需要总结交易金额,如果 CCS 确实发生多次(如果只发生一次,则应该将其添加到系列中)并且年份匹配:
year dealamount CCS
0 2013 37,522,700 Albania_Azerbaijan
1 2013 37,522,700 Albania_Azerbaijan
2 2016 436,341,300 Albania_Greece
3 2019 763,189,200 Albania_Russia
4 2019 763,189,200 Albania_Russia
5 2019 763,189,200 Albania_Russia
6 2019 763,189,200 Albania_Russia
7 2017 150,931,000 Albania_Turkey
8 2016 275,293,750 Albania_Turkey
9 2009 258,328,000 Albania_Turkey
10 2019 153,452,000 Albania_Venezuela
11 2019 153,452,000 Albania_Venezuela
11 2017 153,452,000 Albania_Venezuela
所以在这种情况下,sum_2019 应该是一个熊猫系列,索引是 CCS,总交易量是“观察”。
Albania_Russia 3,052,756,800
Albania_Venezuela 306,904
同样,sum_2013:
Albania_Azerbaijan 75,045,400
非常感谢任何帮助,因为我需要很多数据点并且感觉很迷茫(python 真的很新)我将如何正确地自动化它?
谢谢!!
你想要这个吗?
df.dealamount = df.dealamount.str.replace(',','').astype(int)
new_df = df.groupby(['year','CCS']).agg({'dealamount': sum})
输出-
dealamount
year CCS
2009 Albania_Turkey 258328000
2013 Albania_Azerbaijan 75045400
2016 Albania_Greece 436341300
Albania_Turkey 275293750
2017 Albania_Turkey 150931000
Albania_Venezuela 153452000
2019 Albania_Russia 3052756800
Albania_Venezuela 306904000
# to avoid scientific notation (e notation)
pd.set_option('display.float_format', lambda x: '%.d' % x)
# first filter by 'year' then group by 'CSS' and finally sum by 'dealamount'
sum_2019 = df[df['year']==2019].groupby('CCS')['dealamount'].sum()
print(sum_2019)
CCS
Albania_Russia 3052756800
Albania_Venezuela 306904000
Name: dealamount, dtype: float64
假设我有一个大 Dataframe DS_df,其中包含列名 year、dealamount 和 CCS。对于从 1985 年到 2020 年的每一年,我都需要一个单独的熊猫系列,即 sum_2019。我需要总结交易金额,如果 CCS 确实发生多次(如果只发生一次,则应该将其添加到系列中)并且年份匹配:
year dealamount CCS
0 2013 37,522,700 Albania_Azerbaijan
1 2013 37,522,700 Albania_Azerbaijan
2 2016 436,341,300 Albania_Greece
3 2019 763,189,200 Albania_Russia
4 2019 763,189,200 Albania_Russia
5 2019 763,189,200 Albania_Russia
6 2019 763,189,200 Albania_Russia
7 2017 150,931,000 Albania_Turkey
8 2016 275,293,750 Albania_Turkey
9 2009 258,328,000 Albania_Turkey
10 2019 153,452,000 Albania_Venezuela
11 2019 153,452,000 Albania_Venezuela
11 2017 153,452,000 Albania_Venezuela
所以在这种情况下,sum_2019 应该是一个熊猫系列,索引是 CCS,总交易量是“观察”。
Albania_Russia 3,052,756,800
Albania_Venezuela 306,904
同样,sum_2013:
Albania_Azerbaijan 75,045,400
非常感谢任何帮助,因为我需要很多数据点并且感觉很迷茫(python 真的很新)我将如何正确地自动化它?
谢谢!!
你想要这个吗?
df.dealamount = df.dealamount.str.replace(',','').astype(int)
new_df = df.groupby(['year','CCS']).agg({'dealamount': sum})
输出-
dealamount
year CCS
2009 Albania_Turkey 258328000
2013 Albania_Azerbaijan 75045400
2016 Albania_Greece 436341300
Albania_Turkey 275293750
2017 Albania_Turkey 150931000
Albania_Venezuela 153452000
2019 Albania_Russia 3052756800
Albania_Venezuela 306904000
# to avoid scientific notation (e notation)
pd.set_option('display.float_format', lambda x: '%.d' % x)
# first filter by 'year' then group by 'CSS' and finally sum by 'dealamount'
sum_2019 = df[df['year']==2019].groupby('CCS')['dealamount'].sum()
print(sum_2019)
CCS
Albania_Russia 3052756800
Albania_Venezuela 306904000
Name: dealamount, dtype: float64