使用一个月中的特定日期获取另一列的总数？

Question

   VendorID tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  PULocationID  DOLocationID  fare_amount
0       1.0  2020-01-01 00:28:15   2020-01-01 00:33:03              1.0           238           239          6.0
1       1.0  2020-01-01 00:35:39   2020-01-01 00:43:04              1.0           239           238          7.0
2       1.0  2020-01-01 00:47:41   2020-01-01 00:53:52              1.0           238           238          6.0
3       1.0  2020-01-01 00:55:23   2020-01-01 01:00:14              1.0           238           151          5.5
4       2.0  2020-01-01 00:01:58   2020-01-01 00:04:16              1.0           193           193          3.5
5       2.0  2020-01-01 00:09:44   2020-01-01 00:10:37              1.0             7           193          2.5
6       2.0  2020-01-01 00:39:25   2020-01-01 00:39:29              1.0           193           193          2.5
7       1.0  2020-01-01 00:29:01   2020-01-01 00:40:28              2.0           246            48          8.0
8       1.0  2020-01-01 00:55:11   2020-01-01 01:12:03              2.0           246            79         12.0
9       1.0  2020-01-01 00:37:15   2020-01-01 00:51:41              1.0           163           161          9.5

我有 2020 年 1 月的数据（跨越整个月，这只是一个片段），我想回答 'Saturday is the busiest day in terms of passenger pickups.' 这样的问题我该怎么做？带有标签 'tpep_pickup_datetime' 和 'tpep_dropoff_datetime' 的列的数据类型是对象类型。

Answer 1

为了更好的样本，tpep_pickup_datetime 列中的第一个数据针对不同的日期时间进行了更改：

print (df)
   VendorID tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  \
0       1.0  2020-01-01 00:28:15   2020-01-01 00:33:03              1.0   
1       1.0  2020-01-02 00:35:39   2020-01-01 00:43:04              1.0   
2       1.0  2020-01-02 00:47:41   2020-01-01 00:53:52              1.0   
3       1.0  2020-01-03 00:55:23   2020-01-01 01:00:14              1.0   
4       2.0  2020-01-03 00:01:58   2020-01-01 00:04:16              1.0   
5       2.0  2020-01-03 00:09:44   2020-01-01 00:10:37              1.0   
6       2.0  2020-01-04 00:39:25   2020-01-01 00:39:29              1.0   
7       1.0  2020-01-04 00:29:01   2020-01-01 00:40:28              2.0   
8       1.0  2020-01-04 00:55:11   2020-01-01 01:12:03              2.0   
9       1.0  2020-01-05 00:37:15   2020-01-01 00:51:41              1.0   

   PULocationID  DOLocationID  fare_amount  
0           238           239          6.0  
1           239           238          7.0  
2           238           238          6.0  
3           238           151          5.5  
4           193           193          3.5  
5             7           193          2.5  
6           193           193          2.5  
7           246            48          8.0  
8           246            79         12.0  
9           163           161          9.5

将列转换为日期时间，通过 Series.dt.day_name 获取日期名称并汇总 sum:

df['tpep_pickup_datetime'] = pd.to_datetime(df['tpep_pickup_datetime'])

df['day'] = df['tpep_pickup_datetime'].dt.day_name()

s = df.groupby('day')['passenger_count'].sum()
print (s)
day
Friday       3.0
Saturday     5.0
Sunday       1.0
Thursday     2.0
Wednesday    1.0
Name: passenger_count, dtype: float64

然后对于索引，这里最大值使用Series.idxmax，对于最大值使用max:

print (s.idxmax())
Saturday

print (s.max())
5.0

如果需要两者都可以使用 Series.agg:

print (s.agg(['idxmax','max']))
idxmax    Saturday
max              5
Name: passenger_count, dtype: object

使用一个月中的特定日期获取另一列的总数？

Getting total count of a another column using a specific day in a month?

python-3.x

pandas

data-science