Panda to_datetime 发出警告:tzname CET 已识别但未被理解
Panda to_datetime raises warning: tzname CET identified but not understood
我有一个 pandas DataFrame,其列包含以下形式的字符串时间戳:
31 Jan 2020 17:29:37 CET
09 Apr 2021 15:34:53 CEST
正在将列转换为时间戳returns警告:
df['timestamp'] = pd.to_datetime(df["timestamp"])
c:\Users\user\Miniconda3\lib\site-packages\dateutil\parser\_parser.py:1213: UnknownTimezoneWarning: tzname CET identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
warnings.warn("tzname {tzname} identified but not understood. "
c:\Users\user\Miniconda3\lib\site-packages\dateutil\parser\_parser.py:1213: UnknownTimezoneWarning: tzname CEST identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
warnings.warn("tzname {tzname} identified but not understood. "
我看到很多关于此警告的讨论,但在 pandas' to_datetime()
方法的上下文中找不到解决方案。谁能帮忙?准确地对时间戳进行归一化至关重要,因为我随后想使用此列对数据框进行排序。
谢谢!
可能最有效的方法是去掉时区缩写,解析为日期时间并本地化到正确的时区:
import pandas as pd
df = pd.DataFrame({'datetime': ["31 Jan 2020 17:29:37 CET",
"09 Apr 2021 15:34:53 CEST"]})
df['datetime'] = df['datetime'].str.replace('CET|CEST', '', regex=True)
df['datetime'] = pd.to_datetime(df['datetime']).dt.tz_localize('Europe/Berlin')
df['datetime']
0 2020-01-31 17:29:37+01:00
1 2021-04-09 15:34:53+02:00
Name: datetime, dtype: datetime64[ns, Europe/Berlin]
或者您也可以在这里使用 dateutil 的解析器,但只能使用 apply
。如果您有来自多个时区的数据(不仅例如 CET/CEST),这将特别有用:
import pandas as pd
import dateutil
df = pd.DataFrame({'datetime': ["31 Jan 2020 17:29:37 CET",
"09 Apr 2021 15:34:53 CEST"]})
# define a 'real' time zone for each abbreviation:
tzmapping = {'CET': dateutil.tz.gettz('Europe/Berlin'),
'CEST': dateutil.tz.gettz('Europe/Berlin')}
df['datetime'] = df['datetime'].apply(dateutil.parser.parse, tzinfos=tzmapping)
df['datetime']
0 2020-01-31 17:29:37+01:00
1 2021-04-09 15:34:53+02:00
Name: datetime, dtype: datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Berlin')]
我有一个 pandas DataFrame,其列包含以下形式的字符串时间戳:
31 Jan 2020 17:29:37 CET
09 Apr 2021 15:34:53 CEST
正在将列转换为时间戳returns警告:
df['timestamp'] = pd.to_datetime(df["timestamp"])
c:\Users\user\Miniconda3\lib\site-packages\dateutil\parser\_parser.py:1213: UnknownTimezoneWarning: tzname CET identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
warnings.warn("tzname {tzname} identified but not understood. "
c:\Users\user\Miniconda3\lib\site-packages\dateutil\parser\_parser.py:1213: UnknownTimezoneWarning: tzname CEST identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
warnings.warn("tzname {tzname} identified but not understood. "
我看到很多关于此警告的讨论,但在 pandas' to_datetime()
方法的上下文中找不到解决方案。谁能帮忙?准确地对时间戳进行归一化至关重要,因为我随后想使用此列对数据框进行排序。
谢谢!
可能最有效的方法是去掉时区缩写,解析为日期时间并本地化到正确的时区:
import pandas as pd
df = pd.DataFrame({'datetime': ["31 Jan 2020 17:29:37 CET",
"09 Apr 2021 15:34:53 CEST"]})
df['datetime'] = df['datetime'].str.replace('CET|CEST', '', regex=True)
df['datetime'] = pd.to_datetime(df['datetime']).dt.tz_localize('Europe/Berlin')
df['datetime']
0 2020-01-31 17:29:37+01:00
1 2021-04-09 15:34:53+02:00
Name: datetime, dtype: datetime64[ns, Europe/Berlin]
或者您也可以在这里使用 dateutil 的解析器,但只能使用 apply
。如果您有来自多个时区的数据(不仅例如 CET/CEST),这将特别有用:
import pandas as pd
import dateutil
df = pd.DataFrame({'datetime': ["31 Jan 2020 17:29:37 CET",
"09 Apr 2021 15:34:53 CEST"]})
# define a 'real' time zone for each abbreviation:
tzmapping = {'CET': dateutil.tz.gettz('Europe/Berlin'),
'CEST': dateutil.tz.gettz('Europe/Berlin')}
df['datetime'] = df['datetime'].apply(dateutil.parser.parse, tzinfos=tzmapping)
df['datetime']
0 2020-01-31 17:29:37+01:00
1 2021-04-09 15:34:53+02:00
Name: datetime, dtype: datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Berlin')]