将月份添加到大于时间戳类型限制的日期
Adding months to a date which is bigger than the limit of Timestamp type
我有一个 df,其中一列是 datetime64[ns] 类型的日期。
在此列中,我想使用数据框的另一列作为基础来添加月份:
df['date_shifted']=df['date'].values.astype('datetime64[M]')+(df['months']).values.astype('timedelta64[M]')
当我超过 datetime64 类型的最大值并检索到以下错误时,我的问题就来了:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2846-04-30 00:00:00
有什么方法可以解决这个错误并将我需要的月份添加到我的数据框中吗?
计算后可能出现错误的一些可能数据示例:
日期
个月
28-01-2017
9999
13-05-2018
9999
22-03-2016
9999
2007 年 5 月 12 日
9999
注意:我知道我可以将错误强制转换为 NaT,但我需要日期进行后续计算。
您可以根据 Representing out-of-bounds spans section of the guide on timestamps posted by @HenryEcker in comments. To convert the column simply use .dt.to_period()
:
使用句点
>>> df['date'].dt.to_period(freq='M')
0 2017-01
1 2018-05
2 2016-03
3 2007-05
Name: date, dtype: period[M]
剩下的就简单了,添加int64
个月甚至可以不转换就搞定:
>>> df['shifted_date'] = df['date'].dt.to_period(freq='M') + df['months']
>>> df
date months shifted_date
0 2017-01-28 9999 2850-04
1 2018-05-13 9999 2851-08
2 2016-03-22 9999 2849-06
3 2007-05-12 9999 2840-08
>>> df['shifted_date']
0 2850-04
1 2851-08
2 2849-06
3 2840-08
Name: shifted_date, dtype: period[M]
根据您拥有的日期,您可以使用更小的粒度周期:
>>> df['shifted_date'].astype('Period[D]')
0 2850-04-30
1 2851-08-31
2 2849-06-30
3 2840-08-31
Name: shifted_date, dtype: period[D]
回到日期时间会触发您试图避免的溢出:
>>> df['shifted_date'].dt.start_time
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.8/site-packages/pandas/core/accessor.py", line 78, in _getter
return self._delegate_property_get(name)
File "/usr/lib64/python3.8/site-packages/pandas/core/indexes/accessors.py", line 70, in _delegate_property_get
result = getattr(values, name)
File "/usr/lib64/python3.8/site-packages/pandas/core/arrays/period.py", line 420, in start_time
return self.to_timestamp(how="start")
File "/usr/lib64/python3.8/site-packages/pandas/core/arrays/period.py", line 465, in to_timestamp
new_data = libperiod.periodarr_to_dt64arr(new_data.asi8, base)
File "pandas/_libs/tslibs/period.pyx", line 977, in pandas._libs.tslibs.period.periodarr_to_dt64arr
File "pandas/_libs/tslibs/conversion.pyx", line 246, in pandas._libs.tslibs.conversion.ensure_datetime64ns
File "pandas/_libs/tslibs/np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2850-04-01 00:00:00
我有一个 df,其中一列是 datetime64[ns] 类型的日期。
在此列中,我想使用数据框的另一列作为基础来添加月份:
df['date_shifted']=df['date'].values.astype('datetime64[M]')+(df['months']).values.astype('timedelta64[M]')
当我超过 datetime64 类型的最大值并检索到以下错误时,我的问题就来了:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2846-04-30 00:00:00
有什么方法可以解决这个错误并将我需要的月份添加到我的数据框中吗?
计算后可能出现错误的一些可能数据示例:
日期 | 个月 |
---|---|
28-01-2017 | 9999 |
13-05-2018 | 9999 |
22-03-2016 | 9999 |
2007 年 5 月 12 日 | 9999 |
注意:我知道我可以将错误强制转换为 NaT,但我需要日期进行后续计算。
您可以根据 Representing out-of-bounds spans section of the guide on timestamps posted by @HenryEcker in comments. To convert the column simply use .dt.to_period()
:
>>> df['date'].dt.to_period(freq='M')
0 2017-01
1 2018-05
2 2016-03
3 2007-05
Name: date, dtype: period[M]
剩下的就简单了,添加int64
个月甚至可以不转换就搞定:
>>> df['shifted_date'] = df['date'].dt.to_period(freq='M') + df['months']
>>> df
date months shifted_date
0 2017-01-28 9999 2850-04
1 2018-05-13 9999 2851-08
2 2016-03-22 9999 2849-06
3 2007-05-12 9999 2840-08
>>> df['shifted_date']
0 2850-04
1 2851-08
2 2849-06
3 2840-08
Name: shifted_date, dtype: period[M]
根据您拥有的日期,您可以使用更小的粒度周期:
>>> df['shifted_date'].astype('Period[D]')
0 2850-04-30
1 2851-08-31
2 2849-06-30
3 2840-08-31
Name: shifted_date, dtype: period[D]
回到日期时间会触发您试图避免的溢出:
>>> df['shifted_date'].dt.start_time
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.8/site-packages/pandas/core/accessor.py", line 78, in _getter
return self._delegate_property_get(name)
File "/usr/lib64/python3.8/site-packages/pandas/core/indexes/accessors.py", line 70, in _delegate_property_get
result = getattr(values, name)
File "/usr/lib64/python3.8/site-packages/pandas/core/arrays/period.py", line 420, in start_time
return self.to_timestamp(how="start")
File "/usr/lib64/python3.8/site-packages/pandas/core/arrays/period.py", line 465, in to_timestamp
new_data = libperiod.periodarr_to_dt64arr(new_data.asi8, base)
File "pandas/_libs/tslibs/period.pyx", line 977, in pandas._libs.tslibs.period.periodarr_to_dt64arr
File "pandas/_libs/tslibs/conversion.pyx", line 246, in pandas._libs.tslibs.conversion.ensure_datetime64ns
File "pandas/_libs/tslibs/np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2850-04-01 00:00:00