如何在 PANDAS/PYTHON 中的数据框中找到月份的最大可用日期
how to find maximum available date of the month in dataframe in PANDAS/PYTHON
假设我有一个数据框
date,ent_id,val
2021-03-23,109,61
2021-03-12,104,64
2021-03-31,101,61
2021-03-30,103,64
2021-04-01,111,32
2021-04-01,153,39
2021-04-30,101,51
2021-04-30,103,53
2021-05-12,101,28
2021-05-07,103,26
2021-05-05,171,47
2021-05-05,183,61
2021-06-06,131,45
2021-06-06,133,78
2021-06-30,101,23
2021-06-30,103,31
我想找出当月的maximum available date
我知道如何在 sql
中执行此操作
max(date) over (partition by date_part(year,date),date_part(month,date))
但我无法在 pandas 中找到任何逻辑,或者是否有针对此的任何内置函数
所以输出会像
date,ent_id,val,max_avl_d
2021-03-23,109,61,2021-03-31
2021-03-12,104,64,2021-03-31
2021-03-31,101,61,2021-03-31
2021-03-30,103,64,2021-03-31
2021-04-01,111,32,2021-04-30
2021-04-01,153,39,2021-04-30
2021-04-30,101,51,2021-04-30
2021-04-30,103,53,2021-04-30
2021-05-12,101,28,2021-05-12
2021-05-07,103,26,2021-05-12
2021-05-05,171,47,2021-05-12
2021-05-05,183,61,2021-05-12
2021-06-06,131,45,2021-06-30
2021-06-06,133,78,2021-06-30
2021-06-30,101,23,2021-06-30
2021-06-30,103,31,2021-06-30
这是一个您可以尝试的解决方案,
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['max_date'] = (
df.groupby([df['date'].dt.year, df['date'].dt.month])['date'].transform('max')
)
date ent_id val max_date
0 2021-03-23 109 61 2021-03-31
1 2021-03-12 104 64 2021-03-31
2 2021-03-31 101 61 2021-03-31
...
我们可以利用 to_period() 和 to_timestamp() 来获取所提供月份日期的最后日期。
df['max_avl_d'] = df.date.dt.to_period('M').dt.to_timestamp('M')
输出:
date ent_id val max_avl_d
0 2021-03-23 109 61 2021-03-31
1 2021-03-12 104 64 2021-03-31
2 2021-03-31 101 61 2021-03-31
3 2021-03-30 103 64 2021-03-31
4 2021-04-01 111 32 2021-04-30
5 2021-04-01 153 39 2021-04-30
6 2021-04-30 101 51 2021-04-30
7 2021-04-30 103 53 2021-04-30
8 2021-05-12 101 28 2021-05-31
9 2021-05-07 103 26 2021-05-31
10 2021-05-05 171 47 2021-05-31
11 2021-05-05 183 61 2021-05-31
12 2021-06-06 131 45 2021-06-30
13 2021-06-06 133 78 2021-06-30
14 2021-06-30 101 23 2021-06-30
15 2021-06-30 103 31 2021-06-30
假设我有一个数据框
date,ent_id,val
2021-03-23,109,61
2021-03-12,104,64
2021-03-31,101,61
2021-03-30,103,64
2021-04-01,111,32
2021-04-01,153,39
2021-04-30,101,51
2021-04-30,103,53
2021-05-12,101,28
2021-05-07,103,26
2021-05-05,171,47
2021-05-05,183,61
2021-06-06,131,45
2021-06-06,133,78
2021-06-30,101,23
2021-06-30,103,31
我想找出当月的maximum available date
我知道如何在 sql
中执行此操作max(date) over (partition by date_part(year,date),date_part(month,date))
但我无法在 pandas 中找到任何逻辑,或者是否有针对此的任何内置函数
所以输出会像
date,ent_id,val,max_avl_d
2021-03-23,109,61,2021-03-31
2021-03-12,104,64,2021-03-31
2021-03-31,101,61,2021-03-31
2021-03-30,103,64,2021-03-31
2021-04-01,111,32,2021-04-30
2021-04-01,153,39,2021-04-30
2021-04-30,101,51,2021-04-30
2021-04-30,103,53,2021-04-30
2021-05-12,101,28,2021-05-12
2021-05-07,103,26,2021-05-12
2021-05-05,171,47,2021-05-12
2021-05-05,183,61,2021-05-12
2021-06-06,131,45,2021-06-30
2021-06-06,133,78,2021-06-30
2021-06-30,101,23,2021-06-30
2021-06-30,103,31,2021-06-30
这是一个您可以尝试的解决方案,
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['max_date'] = (
df.groupby([df['date'].dt.year, df['date'].dt.month])['date'].transform('max')
)
date ent_id val max_date
0 2021-03-23 109 61 2021-03-31
1 2021-03-12 104 64 2021-03-31
2 2021-03-31 101 61 2021-03-31
...
我们可以利用 to_period() 和 to_timestamp() 来获取所提供月份日期的最后日期。
df['max_avl_d'] = df.date.dt.to_period('M').dt.to_timestamp('M')
输出:
date ent_id val max_avl_d
0 2021-03-23 109 61 2021-03-31
1 2021-03-12 104 64 2021-03-31
2 2021-03-31 101 61 2021-03-31
3 2021-03-30 103 64 2021-03-31
4 2021-04-01 111 32 2021-04-30
5 2021-04-01 153 39 2021-04-30
6 2021-04-30 101 51 2021-04-30
7 2021-04-30 103 53 2021-04-30
8 2021-05-12 101 28 2021-05-31
9 2021-05-07 103 26 2021-05-31
10 2021-05-05 171 47 2021-05-31
11 2021-05-05 183 61 2021-05-31
12 2021-06-06 131 45 2021-06-30
13 2021-06-06 133 78 2021-06-30
14 2021-06-30 101 23 2021-06-30
15 2021-06-30 103 31 2021-06-30