为什么我的 python/pandas 代码没有正确过滤终止日期?
Why is my python/pandas code not filtering the Termination date correctly?
我的 python 脚本导入一个 xlsx 文件,去除一些 ID,然后应该根据我的 "term_date" 变量过滤掉终止日期。由于今天是 6 月 12 日,我不希望任何 2018 年 3 月 14 日之后的终止日期出现在我的输出中。但是,我看到的是二月终止日期。知道为什么吗?
import pandas as pd
from datetime import datetime, timedelta
TODAY = datetime.today().strftime("%d%m%Y")
term_date = (datetime.today() - timedelta(days=90))
#term_date = (pd.to_datetime('today') - pd.Timedelta(days=90))
remove_id = ['381998','201439']
df = pd.read_excel('Details.xlsx')
df = df[~df['Employee ID'].isin(remove_id)]
df['Termination Date'] = df['Termination Date'].astype(str)
df['Termination Date'] = df['Termination Date'].str.replace('nan', '1/1/2050')
df['Termination Date'] = pd.to_datetime(df['Termination Date'])
df['Hire Date'] = pd.to_datetime(df['Hire Date'])
df['Home Address Line 1'] = df['Home Address Line 1'].str.replace(',', '')
df['Home Address Line 2'] = df['Home Address Line 2'].str.replace(',', '')
df['Shipping Address Line 1'] = df['Shipping Address Line 1'].str.replace(',', '')
df['Shipping Address Line 2'] = df['Shipping Address Line 2'].str.replace(',', '')
df2 = df[df['Termination Date'] >= term_date]
df2.to_excel('roster_file2_' + TODAY + '.xlsx')
我的数据框示例:
Employee ID Termination Date Hire Date Home Address Line 1
234254 2/1/2018 1/1/2015 20 Main St
675867 5/2/2018 1/1/2015 10 Elm St
345665 1/1/2050 1/1/2015 1 Chestnut St
974445 1/1/2050 1/1/2015 12 Cherry St
235465 11/3/2017 1/1/2015 9 Lucky St
这看起来像是日期时间格式的问题。
转换为日期时间
时尝试传递 dayfirst=True
df.TerminationDate = pd.to_datetime(df.TerminationDate,dayfirst=True)
df[df.TerminationDate >= term_date]
Out[519]:
EmployeeID TerminationDate HireDate HomeAddress
2 345665 2050-01-01 01/01/2015 1 ChestnutSt
3 974445 2050-01-01 01/01/2015 12 CherrySt
我的 python 脚本导入一个 xlsx 文件,去除一些 ID,然后应该根据我的 "term_date" 变量过滤掉终止日期。由于今天是 6 月 12 日,我不希望任何 2018 年 3 月 14 日之后的终止日期出现在我的输出中。但是,我看到的是二月终止日期。知道为什么吗?
import pandas as pd
from datetime import datetime, timedelta
TODAY = datetime.today().strftime("%d%m%Y")
term_date = (datetime.today() - timedelta(days=90))
#term_date = (pd.to_datetime('today') - pd.Timedelta(days=90))
remove_id = ['381998','201439']
df = pd.read_excel('Details.xlsx')
df = df[~df['Employee ID'].isin(remove_id)]
df['Termination Date'] = df['Termination Date'].astype(str)
df['Termination Date'] = df['Termination Date'].str.replace('nan', '1/1/2050')
df['Termination Date'] = pd.to_datetime(df['Termination Date'])
df['Hire Date'] = pd.to_datetime(df['Hire Date'])
df['Home Address Line 1'] = df['Home Address Line 1'].str.replace(',', '')
df['Home Address Line 2'] = df['Home Address Line 2'].str.replace(',', '')
df['Shipping Address Line 1'] = df['Shipping Address Line 1'].str.replace(',', '')
df['Shipping Address Line 2'] = df['Shipping Address Line 2'].str.replace(',', '')
df2 = df[df['Termination Date'] >= term_date]
df2.to_excel('roster_file2_' + TODAY + '.xlsx')
我的数据框示例:
Employee ID Termination Date Hire Date Home Address Line 1
234254 2/1/2018 1/1/2015 20 Main St
675867 5/2/2018 1/1/2015 10 Elm St
345665 1/1/2050 1/1/2015 1 Chestnut St
974445 1/1/2050 1/1/2015 12 Cherry St
235465 11/3/2017 1/1/2015 9 Lucky St
这看起来像是日期时间格式的问题。 转换为日期时间
时尝试传递dayfirst=True
df.TerminationDate = pd.to_datetime(df.TerminationDate,dayfirst=True)
df[df.TerminationDate >= term_date]
Out[519]:
EmployeeID TerminationDate HireDate HomeAddress
2 345665 2050-01-01 01/01/2015 1 ChestnutSt
3 974445 2050-01-01 01/01/2015 12 CherrySt