将年作为时间增量单位添加到移动日期的优雅方式 - Pandas

Elegant way to add years as timedelta units to shift dates - Pandas

我有一个如下所示的数据框

df1 = pd.DataFrame({'person_id': [11,11,11,21,21],
                        'admit_dates': ['03/21/2015', '01/21/2016', '7/20/2018','01/11/2017','12/31/2011'],
                        'discharge_dates': ['05/09/2015', '01/29/2016', '7/27/2018','01/12/2017','01/31/2016'],
                        'drug_start_dates': ['05/29/1967', '01/21/1957', '7/27/1959','01/01/1961','12/31/1961'],
                        'offset':[223,223,223,310,310]})

我想做的是将 years 中的 offset 添加到日期列中。

所以,我试图用 unit=yunit=Y 将偏移量转换为 timedelta 对象,然后移动 admit_dates

df1['offset'] = pd.to_timedelta(df1['offset'],unit='Y') #also tried with `y` (small y)
df1['shifted_date'] = df1['admit_dates'] + df1['offset']

但是,我收到以下错误

ValueError: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta values durations.

还有其他优雅的方法可以将日期移动 years 吗?

您可以做的一件事是从日期中提取年份,并将其添加到偏移量中:

df1 = pd.DataFrame({'person_id': [11,11,11,21,21],
                        'admit_dates': ['03/21/2015', '01/21/2016', '7/20/2018','01/11/2017','12/31/2011'],
                        'discharge_dates': ['05/09/2015', '01/29/2016', '7/27/2018','01/12/2017','01/31/2016'],
                        'drug_start_dates': ['05/29/1967', '01/21/1957', '7/27/1959','01/01/1961','12/31/1961'],
                        'offset':[10,20,2,31,12]})
df1.admit_dates = pd.to_datetime(df1.admit_dates)

df1["new_year"] = df1.admit_dates.dt.year + df1.offset
df1["date_with_offset"] = pd.to_datetime(pd.DataFrame({"year": df1.new_year, 
                                                  "month": df1.admit_dates.dt.month, 
                                                  "day":df1.admit_dates.dt.day}))

问题 - 使用您的原始偏移量,某些日期会导致以下错误:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2328-01-11 00:00:00

根据 the documentation,pandas 中的最大日期是 2262 年 4 月 11 日(具体来说大约是午夜十二点)。这可能是因为它们以纳秒为单位来保持时间,而这正是这种表示发生越界错误的原因。

pandas 中支持的最大值 TimestampTimestamp('2262-04-11 23:47:16.854775807'),因此您无法将 310 年添加到 12/31/2011,一种可能的方式是使用 python 的日期时间对象,它支持最多 9999 年,所以你应该能够添加 310 年。

from dateutil.relativedelta import relativedelta

df['admit_dates'] = pd.to_datetime(df['admit_dates'])
df['admit_dates'] = df['admit_dates'].dt.date.add(
    df['offset'].apply(lambda y: relativedelta(years=y)))

结果:

df
   person_id admit_dates discharge_dates drug_start_dates  offset
0         11  2238-03-21      05/09/2015       05/29/1967     223
1         11  2239-01-21      01/29/2016       01/21/1957     223
2         11  2241-07-20       7/27/2018        7/27/1959     223
3         21  2327-01-11      01/12/2017       01/01/1961     310
4         21  2321-12-31      01/31/2016       12/31/1961     310

单位 'Y' 和 'M' 自 pandas 0.25.0 起已弃用 但是多亏了 numpy timedelta64,我们可以通过它在 pandas Timedelta

中使用这些单位
import pandas as pd
import numpy as np

# Builds your dataframe
df1 = pd.DataFrame({'person_id': [11,11,11,21,21],
                    'admit_dates': ['03/21/2015', '01/21/2016', '7/20/2018','01/11/2017','12/31/2011'],
                    'discharge_dates': ['05/09/2015', '01/29/2016', '7/27/2018','01/12/2017','01/31/2016'],
                    'drug_start_dates': ['05/29/1967', '01/21/1957', '7/27/1959','01/01/1961','12/31/1961'],
                    'offset':[223,223,223,310,310]})

>>> df1
   person_id admit_dates discharge_dates drug_start_dates  offset
0         11  03/21/2015      05/09/2015       05/29/1967     223
1         11  01/21/2016      01/29/2016       01/21/1957     223
2         11   7/20/2018       7/27/2018        7/27/1959     223
3         21  01/11/2017      01/12/2017       01/01/1961     310
4         21  12/31/2011      01/31/2016       12/31/1961     310

>>> df1['shifted_date'] = df1.apply(lambda r: pd.Timedelta(np.timedelta64(r['offset'], 'Y'))+ pd.to_datetime(r['admit_dates']), axis=1)
>>> df1['shifted_date'] = df1['shifted_date'].dt.date
>>> df1
   person_id admit_dates discharge_dates drug_start_dates  offset shifted_date
0         11  03/21/2015      05/09/2015       05/29/1967     223   2238-03-21
1         11  01/21/2016      01/29/2016       01/21/1957     223   2239-01-21
2         11   7/20/2018       7/27/2018        7/27/1959     223   2241-07-20
....