如何比较几天范围内两列之间的日期并执行任务?

How do I compare dates between two columns within a range of days and perform a task?

每次 df 中有一个 UNKNOWN 时,我想使用 UNKNOWN 交货日期并检查 df2 中最早的交货日期(按 car_part 分组)以查看它是否在 +- 90 内匹配天数范围?如果日期匹配,则打印日期,否则转到下一个未知。

data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
        'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
        'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
        'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}  

data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'], 
         'car_part': ['100009','100009','100009','100033','100033','100033']}  

df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)

df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])

df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])

我试过这样做

df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
    if row['car_number'] == "UNKNOWN":
        oldest_date = df["car_part"].map(df2.groupby("car_part")["delivery"].min())
        diff = (row['delivery']-oldest_date).days
        if diff<91:
            print(row['delivery']) 

但出现错误 AttributeError: 'Series' object has no attribute 'days'

尝试:

  1. 使用 groupbymin 获取每个汽车零件的最早交货日期。
  2. 找出df的交货日期和最早的交货日期之间的差异并保存到diff
  3. 仅当车号未知且交货时间为最旧日期后 90 天内才保留最旧日期值。
oldest = df["car_part"].map(df2.groupby("car_part")["delivery"].min())

df["oldest"] = oldest.where(df["car_number"].eq("UNKNOWN")&df["delivery"].sub(oldest).abs().dt.days.le(90))

>>> df
  car_part   car_number   delivery        test     oldest
0   100009      UNKNOWN 2004-11-20  12/17/2009 2004-11-01
1   100093  X123-00027C 2009-12-17   7/27/2010        NaT
2   100071  X123-00027C 2010-07-27   7/10/2020        NaT
3   100033      UNKNOWN 2004-11-01  12/22/2006        NaT
4   100033  X123-00148C 2004-09-05   3/26/2007        NaT
5   100043  X123-00148C 2004-11-10   12/1/2007        NaT

以此更改您的代码。我真的不明白最后的输出和你在问什么但是你的地图是错误的。由于要使用相同的代码结构,所以映射行应该是这样的

df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
    if row['car_number'] == "UNKNOWN":
        oldest_date = df2[df["car_part"]==row["car_part"]].groupby("car_part")["delivery"].min().values[0]
        diff = (row['delivery']-oldest_date).days
        if diff<91:
            print(row['delivery'])