如何按自定义日期筛选 Pandas 中的数据？

Question

Pandas 数据框中有一些列：

name age mm yy
         01 23

如何使用自定义函数过滤 mmyy < 当前日期的行（应用）？

mm 和 yy 为 8 non-null object

我这样试过：

from datetime import datetime

# mm = datetime.strptime(mm, "%m").strftime("%m")
# mm = datetime.strptime(mm, "%b").strftime("%m")  #str

mm = '06'
yy = '22'

cdd, cmm, cyy = datetime.today().strftime("%d %m %Y").split()
yy = datetime.strptime(yy, '%y').strftime("%Y")
mm = datetime.strptime(mm, "%m").strftime("%m")
 
dt = datetime(year=int(yy), month=int(mm), day=1)
present = datetime.now()
print(dt <present)

那么，如何将其包装到自定义函数中并应用于过滤器 Pandas？

Answer 1

假设 mm 和 yy 列包含字符串，这里有一种方法可以完成您的问题：

import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob'], 'age':[20, 30], 'mm':['01', '04'], 'yy':['23', '22']})
print(df)

from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%m/%d").date() < now, axis=1)]
print(df)

输入

    name  age  mm  yy
0  Alice   20  01  23
1    Bob   30  04  22

输出

  name  age  mm  yy
1  Bob   30  04  22

UPDATE：根据 OP 在评论中的问题，如果月份是缩写名称，如 Jan 表示一月，这是如何做到的（关于格式代码的完整文档here):

import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob'], 'age':[20,30], 'mm':['Jan','Apr'], 'yy':['23','22']})
print(df)

from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%b/%d").date() < now, axis=1)]
print(df)

输入

    name  age   mm  yy
0  Alice   20  Jan  23
1    Bob   30  Apr  22

输出

  name  age   mm  yy
1  Bob   30  Apr  22

更新#2：这里有一种方法可以使这种方法适用于异构 mm 值（可能像 '01'，可能像 'Jan'一月）：

import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob', 'Carol', 'Dexter'], 'age':[20,30,40,50], 'mm':['Jan','Apr', '02', '03'], 'yy':['23','22','22','23']})
print(df)

from datetime import datetime
now = datetime.today().date()
df = df[df.apply(lambda x: datetime.strptime(f"{x.yy}/{x.mm}/01", "%y/%b/%d" if str.isalpha(x.mm) else "%y/%m/%d").date() < now, axis=1)]
print(df)

输入

     name  age   mm  yy
0   Alice   20  Jan  23
1     Bob   30  Apr  22
2   Carol   40   02  22
3  Dexter   50   03  23

输出

    name  age   mm  yy
1    Bob   30  Apr  22
2  Carol   40   02  22

如何按自定义日期筛选 Pandas 中的数据？

How to filter data in Pandas by custom date?

python

pandas