如何使 Pandas 将包含 NaT 的列从 timedelta 转换为 datetime?
How can I make Pandas convert a column which contains NaT from timedelta to datetime?
我有一个 pandas 数据框,其中一列的类型为 timedelta64[ns]
,我想将其转换为 datetime64[ns]
.
pd.to_datetime()
函数声称可以做到这一点,并且在过去有效,但现在似乎失败了。我认为这可能与我没注意到的 API 怪癖有关。目前它失败了:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 724, in to_datetime
cache_array = _maybe_cache(arg, format, cache, convert_listlike)
File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 152, in _maybe_cache
cache_dates = convert_listlike(unique_dates, format)
File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 363, in _convert_listlike_datetimes
arg, _ = maybe_convert_dtype(arg, copy=False)
File "/usr/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1916, in maybe_convert_dtype
raise TypeError(f"dtype {data.dtype} cannot be converted to datetime64[ns]")
TypeError: dtype timedelta64[ns] cannot be converted to datetime64[ns]
要尝试重现,请使用下面的 MWE:
wget https://chymera.eu/ppb/61ebad.csv
python
import pandas as pd
df = pd.read_csv('61ebad.csv')
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce')
df['Animal_death_date'] = pd.to_datetime(df['Animal_death_date'], errors='coerce')
如果我使用 errors='ignore'
,也会出现此错误。
作为参考,我使用 Pandas 1.0.1
.
如果需要将时间增量转换为日期时间,请添加一些开始日期时间:
import pandas as pd
df = pd.read_csv('https://chymera.eu/ppb/61ebad.csv')
start = pd.to_datetime('2000-01-01')
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce') + start
print (df['Animal_death_date'] )
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
843 NaT
844 NaT
845 2000-05-12 19:00:00
846 2000-05-12 19:00:00
847 2000-05-12 19:00:00
Name: Animal_death_date, Length: 848, dtype: datetime64[ns]
或者添加一些由日期时间填充的列:
import pandas as pd
df = pd.read_csv('https://chymera.eu/ppb/61ebad.csv')
start = pd.to_datetime(df['FMRIMeasurement_date'])
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce') + start
print (df['Animal_death_date'] )
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
843 NaT
844 NaT
845 2018-10-04 19:20:54
846 2018-10-04 19:20:54
847 2018-10-04 19:20:54
Name: Animal_death_date, Length: 848, dtype: datetime64[ns]
从一个小的更正开始:您的来源列也是
一个 text 列,但只有 formatted as timedelta.
要转换 Animal_death_date 列定义以下函数:
def myDateConv(tt):
return pd.to_datetime('2020-' + tt, format='%Y-%j days %X.%f')\
if len(tt) > 0 else np.nan
我假设你的日期是今年,因此 2020 作为初始日期
整个日期字符串的一部分。如果他们来自其他年份,请更改此
相应地加上前缀。
但在您阅读源文件时尽早应用此功能:
df = pd.read_csv('61ebad.csv', index_col=0, parse_dates=['Treatment_start_date',
'Treatment_end_date', 'FMRIMeasurement_date', 'OpenFieldTestMeasurement_date',
'ForcedSwimTestMeasurement_date', 'CageStay_start_date', 'Cage_Treatment_start_date',
'Cage_Treatment_end_date', 'SucrosePreferenceMeasurement_date', 'reference_date'],
converters = { 'Animal_death_date': myDateConv })
注意附加参数:
index_col
- 将初始列视为索引,
parse_dates
- 将 "normally" 格式的日期转换为 datetime,
converters
- 将上述函数应用于源
Animal_death_date列。
我认为,这个解决方案比单独转换更简单,更具可读性
特定列。
我有一个 pandas 数据框,其中一列的类型为 timedelta64[ns]
,我想将其转换为 datetime64[ns]
.
pd.to_datetime()
函数声称可以做到这一点,并且在过去有效,但现在似乎失败了。我认为这可能与我没注意到的 API 怪癖有关。目前它失败了:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 724, in to_datetime
cache_array = _maybe_cache(arg, format, cache, convert_listlike)
File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 152, in _maybe_cache
cache_dates = convert_listlike(unique_dates, format)
File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 363, in _convert_listlike_datetimes
arg, _ = maybe_convert_dtype(arg, copy=False)
File "/usr/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1916, in maybe_convert_dtype
raise TypeError(f"dtype {data.dtype} cannot be converted to datetime64[ns]")
TypeError: dtype timedelta64[ns] cannot be converted to datetime64[ns]
要尝试重现,请使用下面的 MWE:
wget https://chymera.eu/ppb/61ebad.csv
python
import pandas as pd
df = pd.read_csv('61ebad.csv')
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce')
df['Animal_death_date'] = pd.to_datetime(df['Animal_death_date'], errors='coerce')
如果我使用 errors='ignore'
,也会出现此错误。
作为参考,我使用 Pandas 1.0.1
.
如果需要将时间增量转换为日期时间,请添加一些开始日期时间:
import pandas as pd
df = pd.read_csv('https://chymera.eu/ppb/61ebad.csv')
start = pd.to_datetime('2000-01-01')
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce') + start
print (df['Animal_death_date'] )
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
843 NaT
844 NaT
845 2000-05-12 19:00:00
846 2000-05-12 19:00:00
847 2000-05-12 19:00:00
Name: Animal_death_date, Length: 848, dtype: datetime64[ns]
或者添加一些由日期时间填充的列:
import pandas as pd
df = pd.read_csv('https://chymera.eu/ppb/61ebad.csv')
start = pd.to_datetime(df['FMRIMeasurement_date'])
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce') + start
print (df['Animal_death_date'] )
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
843 NaT
844 NaT
845 2018-10-04 19:20:54
846 2018-10-04 19:20:54
847 2018-10-04 19:20:54
Name: Animal_death_date, Length: 848, dtype: datetime64[ns]
从一个小的更正开始:您的来源列也是 一个 text 列,但只有 formatted as timedelta.
要转换 Animal_death_date 列定义以下函数:
def myDateConv(tt):
return pd.to_datetime('2020-' + tt, format='%Y-%j days %X.%f')\
if len(tt) > 0 else np.nan
我假设你的日期是今年,因此 2020 作为初始日期 整个日期字符串的一部分。如果他们来自其他年份,请更改此 相应地加上前缀。
但在您阅读源文件时尽早应用此功能:
df = pd.read_csv('61ebad.csv', index_col=0, parse_dates=['Treatment_start_date',
'Treatment_end_date', 'FMRIMeasurement_date', 'OpenFieldTestMeasurement_date',
'ForcedSwimTestMeasurement_date', 'CageStay_start_date', 'Cage_Treatment_start_date',
'Cage_Treatment_end_date', 'SucrosePreferenceMeasurement_date', 'reference_date'],
converters = { 'Animal_death_date': myDateConv })
注意附加参数:
index_col
- 将初始列视为索引,parse_dates
- 将 "normally" 格式的日期转换为 datetime,converters
- 将上述函数应用于源 Animal_death_date列。
我认为,这个解决方案比单独转换更简单,更具可读性 特定列。