无法将列转换为日期时间

Unable to convert a column to datetime

我尝试了这里的许多建议,但 none 已解决。 我有两列包含这样的观察结果:15:08:19

如果我写

df.time_entry.describe() 

出现:

count       814262
unique       56765
top       15:03:00
freq           103
Name: time_entry, dtype: object

我已经运行这个代码:

df['time_entry'] = pd.to_datetime(df['time_entry'],format= '%H:%M:%S', errors='ignore' ).dt.time

但重新运行描述代码仍然returnsdtype: object

dt.time 的目的是什么?

只需删除 dt.time,您从对象到日期时间的转换将完全正常。

df['time_entry'] = pd.to_datetime(df['time_entry'],format= '%H:%M:%S')

问题是您将日期时间访问器 (.dt) 与 属性 time 一起使用,然后您无法将这两列彼此相减。所以,只需要省略 .dt.time 就可以了。

这是一些包含 2 列字符串的数据

df = pd.DataFrame()
df['time_entry'] = ['12:01:00', '15:03:00', '16:43:00', '14:11:00']
df['time_entry2'] = ['13:03:00', '14:04:00', '19:23:00', '18:12:00']

print(df)
  time_entry time_entry2
0   12:01:00    13:03:00
1   15:03:00    14:04:00
2   16:43:00    19:23:00
3   14:11:00    18:12:00

将两列都转换为 datetime dtype

df['time_entry'] = pd.to_datetime(df['time_entry'], format= '%H:%M:%S', errors='ignore')
df['time_entry2'] = pd.to_datetime(df['time_entry2'], format= '%H:%M:%S', errors='ignore')

print(df)
           time_entry         time_entry2
0 1900-01-01 12:01:00 1900-01-01 13:03:00
1 1900-01-01 15:03:00 1900-01-01 14:04:00
2 1900-01-01 16:43:00 1900-01-01 19:23:00
3 1900-01-01 14:11:00 1900-01-01 18:12:00

print(df.dtypes)
time_entry     datetime64[ns]
time_entry2    datetime64[ns]
dtype: object

(可选)

df['time_entry'] = df['time_entry'].dt.tz_localize('US/Central')
df['time_entry2'] = df['time_entry2'].dt.tz_localize('US/Central')

现在执行 2 列之间的时间差(减法)并获得以天数表示的时间差(作为浮点数)

  • Method 1 给出 Diff_days1
  • 给出 Diff_days2
  • 给出 Diff_days3
df['Diff_days1'] = (df['time_entry'] - df['time_entry2']).dt.total_seconds()/60/60/24
df['Diff_days2'] = (df['time_entry'] - df['time_entry2']) / np.timedelta64(1, 'D')
df['Diff_days3'] = (df['time_entry'].sub(df['time_entry2'])).dt.total_seconds()/60/60/24

print(df)
           time_entry         time_entry2  Diff_days1  Diff_days2  Diff_days3
0 1900-01-01 12:01:00 1900-01-01 13:03:00   -0.043056   -0.043056   -0.043056
1 1900-01-01 15:03:00 1900-01-01 14:04:00    0.040972    0.040972    0.040972
2 1900-01-01 16:43:00 1900-01-01 19:23:00   -0.111111   -0.111111   -0.111111
3 1900-01-01 14:11:00 1900-01-01 18:12:00   -0.167361   -0.167361   -0.167361

编辑

如果您尝试访问 datetime 属性,则可以直接使用 time_entry 列(而不是时差列)来实现。这是一个例子

df['day1'] = df['time_entry'].dt.day
df['time1'] = df['time_entry'].dt.time
df['minute1'] = df['time_entry'].dt.minute
df['dayofweek1'] = df['time_entry'].dt.weekday
df['day2'] = df['time_entry2'].dt.day
df['time2'] = df['time_entry2'].dt.time
df['minute2'] = df['time_entry2'].dt.minute
df['dayofweek2'] = df['time_entry2'].dt.weekday

print(df[['day1', 'time1', 'minute1', 'dayofweek1',
        'day2', 'time2', 'minute2', 'dayofweek2']])
   day1     time1  minute1  dayofweek1  day2     time2  minute2  dayofweek2
0     1  12:01:00        1           0     1  13:03:00        3           0
1     1  15:03:00        3           0     1  14:04:00        4           0
2     1  16:43:00       43           0     1  19:23:00       23           0
3     1  14:11:00       11           0     1  18:12:00       12           0