如何从数据框中删除负时间?

How to remove negative times from a dataframe?

我在 pandas 中创建了一个数据框,显示完成工作订单所花费的总时间的计算结果。由于人为输入错误,一些时间计算结果为负时间,如您在第 30 行中所见,即使我将 am 切换到 PM,它仍然会给出错误时间,因为工作时间介于 07:30 - 16:00, 最好忽略这些时间

Work Order  WorkType        AST                 AFT             comp_time
10  BAEBRO-898690   RM  1900-01-01 06:27:41 1900-01-01 08:05:28 01:37:47
13  BAEBRO-914693   RM  1900-01-01 08:30:00 1900-01-01 09:00:00 00:30:00
27  BAEBRO-898787   RM  1900-01-01 10:00:00 1900-01-01 10:30:00 00:30:00
30  BAEBRO-914680   RM  1900-01-01 14:32:08 1900-01-01 10:37:17 -1 days +20:05:09
37  BAEBRO-914660   RM  1900-01-01 10:47:39 1900-01-01 11:32:02 00:44:23`

我得到这个结果的代码是:

import pandas as pd
from datetime import time
from datetime import timedelta
from pandas import DataFrame
import matplotlib as plt

df = pd.read_excel('C:/Users/Nativ_Zero/Desktop/work data/July.xls')

df_work = df[['Work Order', 'WorkType', 'AST','AFT']]

df_work['AFT'] = pd.to_datetime(df_work['AFT'], format='%H:%M:%S', errors='coerce')
df_work['AST'] = pd.to_datetime(df_work['AST'], format='%H:%M:%S', errors='coerce')

rm_work = df_work[df_work.WorkType == 'RM']


rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST']
rm_work.head()

下面的代码对您有用:

df = pd.read_excel('C:/Users/Nativ_Zero/Desktop/work data/July.xls')

df_work = df[['Work Order', 'WorkType', 'AST','AFT']]

df_work['AFT'] = pd.to_datetime(df_work['AFT'], format='%H:%M:%S', errors='coerce')
df_work['AST'] = pd.to_datetime(df_work['AST'], format='%H:%M:%S', errors='coerce')

rm_work = df_work[df_work.WorkType == 'RM']


rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST']
rm_work = rm_work[rm_work.comp_time >= pd.Timedelta(0)]  # Filtering condition
rm_work.head()

您需要使用适当的数据类型进行比较,在本例中为 Timedelta

使用 .apply() 检查 pandas 时间是否为负数(一定要与 pd.Timedelta(0) 进行比较,而不仅仅是 0,因为那样会出错)。如果为负数,return 一个 numpy NaN。最后,排除带有 NaN 的行。

如果您的列中已有 NaN 并想保留它们,这将导致问题!在这种情况下,您可以将方法更改为 return 其他内容,然后排除该唯一值。

def check_if_negative(pd_time):
    if pd_time >= pd.Timedelta(0): # positive time and 0 time
        return pd_time
    elif pd_time < pd.Timedelta(0): # negative time
        return np.NaN
    else:
        print(f'problem! {pd_time} has an issue') # quick error check

rm_work['comp_time'] = rm_work['AFT'] - rm_work['AST'] # create timedelta
rm_work['comp_time'] = rm_work.comp_time.apply(check_if_negative) # apply check to column

rm_work = rm_work.dropna(subset=['comp_time']) # delete rows with NaN