Panda to_datetime 发出警告:tzname CET 已识别但未被理解

Panda to_datetime raises warning: tzname CET identified but not understood

我有一个 pandas DataFrame,其列包含以下形式的字符串时间戳:

31 Jan 2020 17:29:37 CET

09 Apr 2021 15:34:53 CEST

正在将列转换为时间戳returns警告:

df['timestamp'] = pd.to_datetime(df["timestamp"])
c:\Users\user\Miniconda3\lib\site-packages\dateutil\parser\_parser.py:1213: UnknownTimezoneWarning: tzname CET identified but not understood.  Pass `tzinfos` argument in order to correctly return a timezone-aware datetime.  In a future version, this will raise an exception.
  warnings.warn("tzname {tzname} identified but not understood.  "
c:\Users\user\Miniconda3\lib\site-packages\dateutil\parser\_parser.py:1213: UnknownTimezoneWarning: tzname CEST identified but not understood.  Pass `tzinfos` argument in order to correctly return a timezone-aware datetime.  In a future version, this will raise an exception.
  warnings.warn("tzname {tzname} identified but not understood.  "

我看到很多关于此警告的讨论,但在 pandas' to_datetime() 方法的上下文中找不到解决方案。谁能帮忙?准确地对时间戳进行归一化至关重要,因为我随后想使用此列对数据框进行排序。

谢谢!

可能最有效的方法是去掉时区缩写,解析为日期时间并本地化到正确的时区:

import pandas as pd
df = pd.DataFrame({'datetime': ["31 Jan 2020 17:29:37 CET",
                                "09 Apr 2021 15:34:53 CEST"]})

df['datetime'] = df['datetime'].str.replace('CET|CEST', '', regex=True)
df['datetime'] = pd.to_datetime(df['datetime']).dt.tz_localize('Europe/Berlin')

df['datetime']
0   2020-01-31 17:29:37+01:00
1   2021-04-09 15:34:53+02:00
Name: datetime, dtype: datetime64[ns, Europe/Berlin]

或者您也可以在这里使用 dateutil 的解析器,但只能使用 apply。如果您有来自多个时区的数据(不仅例如 CET/CEST),这将特别有用:

import pandas as pd
import dateutil

df = pd.DataFrame({'datetime': ["31 Jan 2020 17:29:37 CET",
                                "09 Apr 2021 15:34:53 CEST"]})

# define a 'real' time zone for each abbreviation:
tzmapping = {'CET': dateutil.tz.gettz('Europe/Berlin'),
             'CEST': dateutil.tz.gettz('Europe/Berlin')}

df['datetime'] = df['datetime'].apply(dateutil.parser.parse, tzinfos=tzmapping)

df['datetime']
0   2020-01-31 17:29:37+01:00
1   2021-04-09 15:34:53+02:00
Name: datetime, dtype: datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Berlin')]