pandas 中数据框的聚合函数
Aggregate function to data frame in pandas
我想从聚合函数创建数据框。我认为它会默认创建一个数据框,正如这个解决方案所述,但它创建了一个系列,我不知道为什么 (Converting a Pandas GroupBy object to DataFrame).
数据框来自 Kaggle 的 San Francisco Salaries。我的代码:
df=pd.read_csv('Salaries.csv')
in: type(df)
out: pandas.core.frame.DataFrame
in: df.head()
out: EmployeeName JobTitle TotalPay TotalPayBenefits Year Status 2BasePay 2OvertimePay 2OtherPay 2Benefits 2Year
0 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 567595.43 567595.43 2011 NaN 167411.18 0.00 400184.25 NaN 2011-01-01
1 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT) 538909.28 538909.28 2011 NaN 155966.02 245131.88 137811.38 NaN 2011-01-01
2 ALBERT PARDINI CAPTAIN III (POLICE DEPARTMENT) 335279.91 335279.91 2011 NaN 212739.13 106088.18 16452.60 NaN 2011-01-01
3 CHRISTOPHER CHONG WIRE ROPE CABLE MAINTENANCE MECHANIC 332343.61 332343.61 2011 NaN 77916.00 56120.71 198306.90 NaN 2011-01-01
4 PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 326373.19 326373.19 2011 NaN 134401.60 9737.00 182234.59 NaN 2011-01-01
in: df2=df.groupby(['JobTitle'])['TotalPay'].mean()
type(df2)
out: pandas.core.series.Series
我希望 df2 是一个包含列 'JobTitle' 和 'TotalPlay'
的数据框
分解你的代码:
df2 = df.groupby(['JobTitle'])['TotalPay'].mean()
groupby
没问题。 ['TotalPay']
是失误。这就是告诉 groupby
只对 pd.Series
df['TotalPay']
上的 ['JobTitle']
中定义的每个组执行 mean
函数。相反,您想使用 [['TotalPay']]
来引用此列。注意双括号。那些双括号表示 pd.DataFrame
.
回顾
df2 = df2=df.groupby(['JobTitle'])[['TotalPay']].mean()
我想从聚合函数创建数据框。我认为它会默认创建一个数据框,正如这个解决方案所述,但它创建了一个系列,我不知道为什么 (Converting a Pandas GroupBy object to DataFrame).
数据框来自 Kaggle 的 San Francisco Salaries。我的代码:
df=pd.read_csv('Salaries.csv')
in: type(df)
out: pandas.core.frame.DataFrame
in: df.head()
out: EmployeeName JobTitle TotalPay TotalPayBenefits Year Status 2BasePay 2OvertimePay 2OtherPay 2Benefits 2Year
0 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 567595.43 567595.43 2011 NaN 167411.18 0.00 400184.25 NaN 2011-01-01
1 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT) 538909.28 538909.28 2011 NaN 155966.02 245131.88 137811.38 NaN 2011-01-01
2 ALBERT PARDINI CAPTAIN III (POLICE DEPARTMENT) 335279.91 335279.91 2011 NaN 212739.13 106088.18 16452.60 NaN 2011-01-01
3 CHRISTOPHER CHONG WIRE ROPE CABLE MAINTENANCE MECHANIC 332343.61 332343.61 2011 NaN 77916.00 56120.71 198306.90 NaN 2011-01-01
4 PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 326373.19 326373.19 2011 NaN 134401.60 9737.00 182234.59 NaN 2011-01-01
in: df2=df.groupby(['JobTitle'])['TotalPay'].mean()
type(df2)
out: pandas.core.series.Series
我希望 df2 是一个包含列 'JobTitle' 和 'TotalPlay'
的数据框分解你的代码:
df2 = df.groupby(['JobTitle'])['TotalPay'].mean()
groupby
没问题。 ['TotalPay']
是失误。这就是告诉 groupby
只对 pd.Series
df['TotalPay']
上的 ['JobTitle']
中定义的每个组执行 mean
函数。相反,您想使用 [['TotalPay']]
来引用此列。注意双括号。那些双括号表示 pd.DataFrame
.
回顾
df2 = df2=df.groupby(['JobTitle'])[['TotalPay']].mean()