如何从数据框中删除负时间?
How to remove negative times from a dataframe?
我在 pandas 中创建了一个数据框,显示完成工作订单所花费的总时间的计算结果。由于人为输入错误,一些时间计算结果为负时间,如您在第 30 行中所见,即使我将 am 切换到 PM,它仍然会给出错误时间,因为工作时间介于 07:30 - 16:00, 最好忽略这些时间
Work Order WorkType AST AFT comp_time
10 BAEBRO-898690 RM 1900-01-01 06:27:41 1900-01-01 08:05:28 01:37:47
13 BAEBRO-914693 RM 1900-01-01 08:30:00 1900-01-01 09:00:00 00:30:00
27 BAEBRO-898787 RM 1900-01-01 10:00:00 1900-01-01 10:30:00 00:30:00
30 BAEBRO-914680 RM 1900-01-01 14:32:08 1900-01-01 10:37:17 -1 days +20:05:09
37 BAEBRO-914660 RM 1900-01-01 10:47:39 1900-01-01 11:32:02 00:44:23`
我得到这个结果的代码是:
import pandas as pd
from datetime import time
from datetime import timedelta
from pandas import DataFrame
import matplotlib as plt
df = pd.read_excel('C:/Users/Nativ_Zero/Desktop/work data/July.xls')
df_work = df[['Work Order', 'WorkType', 'AST','AFT']]
df_work['AFT'] = pd.to_datetime(df_work['AFT'], format='%H:%M:%S', errors='coerce')
df_work['AST'] = pd.to_datetime(df_work['AST'], format='%H:%M:%S', errors='coerce')
rm_work = df_work[df_work.WorkType == 'RM']
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST']
rm_work.head()
下面的代码对您有用:
df = pd.read_excel('C:/Users/Nativ_Zero/Desktop/work data/July.xls')
df_work = df[['Work Order', 'WorkType', 'AST','AFT']]
df_work['AFT'] = pd.to_datetime(df_work['AFT'], format='%H:%M:%S', errors='coerce')
df_work['AST'] = pd.to_datetime(df_work['AST'], format='%H:%M:%S', errors='coerce')
rm_work = df_work[df_work.WorkType == 'RM']
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST']
rm_work = rm_work[rm_work.comp_time >= pd.Timedelta(0)] # Filtering condition
rm_work.head()
您需要使用适当的数据类型进行比较,在本例中为 Timedelta
。
使用 .apply() 检查 pandas 时间是否为负数(一定要与 pd.Timedelta(0) 进行比较,而不仅仅是 0,因为那样会出错)。如果为负数,return 一个 numpy NaN。最后,排除带有 NaN 的行。
如果您的列中已有 NaN 并想保留它们,这将导致问题!在这种情况下,您可以将方法更改为 return 其他内容,然后排除该唯一值。
def check_if_negative(pd_time):
if pd_time >= pd.Timedelta(0): # positive time and 0 time
return pd_time
elif pd_time < pd.Timedelta(0): # negative time
return np.NaN
else:
print(f'problem! {pd_time} has an issue') # quick error check
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST'] # create timedelta
rm_work['comp_time'] = rm_work.comp_time.apply(check_if_negative) # apply check to column
rm_work = rm_work.dropna(subset=['comp_time']) # delete rows with NaN
我在 pandas 中创建了一个数据框,显示完成工作订单所花费的总时间的计算结果。由于人为输入错误,一些时间计算结果为负时间,如您在第 30 行中所见,即使我将 am 切换到 PM,它仍然会给出错误时间,因为工作时间介于 07:30 - 16:00, 最好忽略这些时间
Work Order WorkType AST AFT comp_time
10 BAEBRO-898690 RM 1900-01-01 06:27:41 1900-01-01 08:05:28 01:37:47
13 BAEBRO-914693 RM 1900-01-01 08:30:00 1900-01-01 09:00:00 00:30:00
27 BAEBRO-898787 RM 1900-01-01 10:00:00 1900-01-01 10:30:00 00:30:00
30 BAEBRO-914680 RM 1900-01-01 14:32:08 1900-01-01 10:37:17 -1 days +20:05:09
37 BAEBRO-914660 RM 1900-01-01 10:47:39 1900-01-01 11:32:02 00:44:23`
我得到这个结果的代码是:
import pandas as pd
from datetime import time
from datetime import timedelta
from pandas import DataFrame
import matplotlib as plt
df = pd.read_excel('C:/Users/Nativ_Zero/Desktop/work data/July.xls')
df_work = df[['Work Order', 'WorkType', 'AST','AFT']]
df_work['AFT'] = pd.to_datetime(df_work['AFT'], format='%H:%M:%S', errors='coerce')
df_work['AST'] = pd.to_datetime(df_work['AST'], format='%H:%M:%S', errors='coerce')
rm_work = df_work[df_work.WorkType == 'RM']
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST']
rm_work.head()
下面的代码对您有用:
df = pd.read_excel('C:/Users/Nativ_Zero/Desktop/work data/July.xls')
df_work = df[['Work Order', 'WorkType', 'AST','AFT']]
df_work['AFT'] = pd.to_datetime(df_work['AFT'], format='%H:%M:%S', errors='coerce')
df_work['AST'] = pd.to_datetime(df_work['AST'], format='%H:%M:%S', errors='coerce')
rm_work = df_work[df_work.WorkType == 'RM']
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST']
rm_work = rm_work[rm_work.comp_time >= pd.Timedelta(0)] # Filtering condition
rm_work.head()
您需要使用适当的数据类型进行比较,在本例中为 Timedelta
。
使用 .apply() 检查 pandas 时间是否为负数(一定要与 pd.Timedelta(0) 进行比较,而不仅仅是 0,因为那样会出错)。如果为负数,return 一个 numpy NaN。最后,排除带有 NaN 的行。
如果您的列中已有 NaN 并想保留它们,这将导致问题!在这种情况下,您可以将方法更改为 return 其他内容,然后排除该唯一值。
def check_if_negative(pd_time):
if pd_time >= pd.Timedelta(0): # positive time and 0 time
return pd_time
elif pd_time < pd.Timedelta(0): # negative time
return np.NaN
else:
print(f'problem! {pd_time} has an issue') # quick error check
rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST'] # create timedelta
rm_work['comp_time'] = rm_work.comp_time.apply(check_if_negative) # apply check to column
rm_work = rm_work.dropna(subset=['comp_time']) # delete rows with NaN