如何比较几天范围内两列之间的日期并执行任务?
How do I compare dates between two columns within a range of days and perform a task?
每次 df 中有一个 UNKNOWN 时,我想使用 UNKNOWN 交货日期并检查 df2 中最早的交货日期(按 car_part 分组)以查看它是否在 +- 90 内匹配天数范围?如果日期匹配,则打印日期,否则转到下一个未知。
data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}
data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'],
'car_part': ['100009','100009','100009','100033','100033','100033']}
df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)
df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])
df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])
我试过这样做
df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
if row['car_number'] == "UNKNOWN":
oldest_date = df["car_part"].map(df2.groupby("car_part")["delivery"].min())
diff = (row['delivery']-oldest_date).days
if diff<91:
print(row['delivery'])
但出现错误 AttributeError: 'Series' object has no attribute 'days'
尝试:
- 使用
groupby
和 min
获取每个汽车零件的最早交货日期。
- 找出
df
的交货日期和最早的交货日期之间的差异并保存到diff
- 仅当车号未知且交货时间为最旧日期后 90 天内才保留最旧日期值。
oldest = df["car_part"].map(df2.groupby("car_part")["delivery"].min())
df["oldest"] = oldest.where(df["car_number"].eq("UNKNOWN")&df["delivery"].sub(oldest).abs().dt.days.le(90))
>>> df
car_part car_number delivery test oldest
0 100009 UNKNOWN 2004-11-20 12/17/2009 2004-11-01
1 100093 X123-00027C 2009-12-17 7/27/2010 NaT
2 100071 X123-00027C 2010-07-27 7/10/2020 NaT
3 100033 UNKNOWN 2004-11-01 12/22/2006 NaT
4 100033 X123-00148C 2004-09-05 3/26/2007 NaT
5 100043 X123-00148C 2004-11-10 12/1/2007 NaT
以此更改您的代码。我真的不明白最后的输出和你在问什么但是你的地图是错误的。由于要使用相同的代码结构,所以映射行应该是这样的
df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
if row['car_number'] == "UNKNOWN":
oldest_date = df2[df["car_part"]==row["car_part"]].groupby("car_part")["delivery"].min().values[0]
diff = (row['delivery']-oldest_date).days
if diff<91:
print(row['delivery'])
每次 df 中有一个 UNKNOWN 时,我想使用 UNKNOWN 交货日期并检查 df2 中最早的交货日期(按 car_part 分组)以查看它是否在 +- 90 内匹配天数范围?如果日期匹配,则打印日期,否则转到下一个未知。
data = {'car_part': ['100009','100093','100071','100033','100033','100043'],
'car_number': ['UNKNOWN', 'X123-00027C', 'X123-00027C', 'UNKNOWN', 'X123-00148C', 'X123-00148C'],
'delivery': ['11/20/2004', '12/17/2009', '7/27/2010', '11/1/2004', '9/5/2004', '11/10/2004'],
'test': ['12/17/2009', '7/27/2010', '7/10/2020', '12/22/2006', '3/26/2007', '12/1/2007']}
data2 = {'delivery': ['11/1/2004', '12/1/2004', '1/1/2005', '7/1/2006', '8/1/2006', '9/2/2006'],
'car_part': ['100009','100009','100009','100033','100033','100033']}
df = pd.DataFrame(data)
print(df)
df2 = pd.DataFrame(data2)
print(df2)
df['delivery'] = df['delivery'].astype('datetime64[ns]')
df.sort_values(by = ['car_part', 'delivery', 'test'], ascending=[True, True, True])
df2['delivery'] = df2['delivery'].astype('datetime64[ns]')
df2.sort_values(by = ['car_part', 'delivery'], ascending=[True, True])
我试过这样做
df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
if row['car_number'] == "UNKNOWN":
oldest_date = df["car_part"].map(df2.groupby("car_part")["delivery"].min())
diff = (row['delivery']-oldest_date).days
if diff<91:
print(row['delivery'])
但出现错误 AttributeError: 'Series' object has no attribute 'days'
尝试:
- 使用
groupby
和min
获取每个汽车零件的最早交货日期。 - 找出
df
的交货日期和最早的交货日期之间的差异并保存到diff
- 仅当车号未知且交货时间为最旧日期后 90 天内才保留最旧日期值。
oldest = df["car_part"].map(df2.groupby("car_part")["delivery"].min())
df["oldest"] = oldest.where(df["car_number"].eq("UNKNOWN")&df["delivery"].sub(oldest).abs().dt.days.le(90))
>>> df
car_part car_number delivery test oldest
0 100009 UNKNOWN 2004-11-20 12/17/2009 2004-11-01
1 100093 X123-00027C 2009-12-17 7/27/2010 NaT
2 100071 X123-00027C 2010-07-27 7/10/2020 NaT
3 100033 UNKNOWN 2004-11-01 12/22/2006 NaT
4 100033 X123-00148C 2004-09-05 3/26/2007 NaT
5 100043 X123-00148C 2004-11-10 12/1/2007 NaT
以此更改您的代码。我真的不明白最后的输出和你在问什么但是你的地图是错误的。由于要使用相同的代码结构,所以映射行应该是这样的
df["delivery"] = pd.to_datetime(df["delivery"])
df2["delivery"] = pd.to_datetime(df2["delivery"])
for index, row in df.iterrows():
if row['car_number'] == "UNKNOWN":
oldest_date = df2[df["car_part"]==row["car_part"]].groupby("car_part")["delivery"].min().values[0]
diff = (row['delivery']-oldest_date).days
if diff<91:
print(row['delivery'])