pandas.to_datetime: 选择哪种格式?
pandas.to_datetime: which format to choose?
我有一个这样的 .csv:
"Date","Time","Open","High","Low","Close","Volume"
12/30/2002,0930,0.94,0.94,0.94,0.94,571466
我想用 pandas.to_datetime 模块转换 "Time" 列值,但我找不到要使用的正确格式,因为小时和分钟之间没有分隔符。
有人可以帮我吗?
这应该可行,但我不确定是否有更好的方法:
from StringIO import StringIO
fh = StringIO('''"Date","Time","Open","High","Low","Close","Volume"
12/30/2002,0930,0.94,0.94,0.94,0.94,571466''')
df = pd.read_csv(fh, dtype={'Time':object})
df['Timestamp'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
print df
输出:
Date Time Open High Low Close Volume Timestamp
0 12/30/2002 0930 0.94 0.94 0.94 0.94 571466 2002-12-30 09:30:00
您可以通过指定日期格式告诉 pandas 那里没有分隔符。 %H%M
告诉 python 你有一个没有分隔线的时间。例如,如果您有 :
的分隔符,那么您将使用 format='%H:%M'
.
假设您已经加载了所有内容并且您的数据框加载为 df
。
from pandas import pandas
# file loading and such
asset['Date'] = pandas.to_datetime(asset['Date'])
asset['Time'] = pandas.DatetimeIndex(pandas.to_datetime(asset['Time'], format = '%H%M')).time
会给你
Date Time Open High Low Close Volume
0 2002-12-30 09:30:00 0.94 0.94 0.94 0.94 571466
Python3 人:
df['Time'] = pd.to_datetime(df['Time'], format='%H%M').dt.time
会给你
Date Time Open High Low Close Volume
0 12/30/2002 09:30:00 0.94 0.94 0.94 0.94 571466
您可以通过将列表列表传递给 parse_dates
参数来传递要解析为完整 datetime
的列列表:
In [6]:
import io
import pandas as pd
t='''"Date","Time","Open","High","Low","Close","Volume"
12/30/2002,0930,0.94,0.94,0.94,0.94,571466'''
df = pd.read_csv(io.StringIO(t), parse_dates=[['Date','Time']], keep_date_col=True)
df
Out[6]:
Date_Time Date Time Open High Low Close Volume
0 2002-12-30 09:30:00 12/30/2002 0930 0.94 0.94 0.94 0.94 571466
您可以看到 dtypes
符合预期:
In [7]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 8 columns):
Date_Time 1 non-null datetime64[ns]
Date 1 non-null object
Time 1 non-null object
Open 1 non-null float64
High 1 non-null float64
Low 1 non-null float64
Close 1 non-null float64
Volume 1 non-null int64
dtypes: datetime64[ns](1), float64(4), int64(1), object(2)
memory usage: 144.0+ bytes
我有一个这样的 .csv:
"Date","Time","Open","High","Low","Close","Volume"
12/30/2002,0930,0.94,0.94,0.94,0.94,571466
我想用 pandas.to_datetime 模块转换 "Time" 列值,但我找不到要使用的正确格式,因为小时和分钟之间没有分隔符。
有人可以帮我吗?
这应该可行,但我不确定是否有更好的方法:
from StringIO import StringIO
fh = StringIO('''"Date","Time","Open","High","Low","Close","Volume"
12/30/2002,0930,0.94,0.94,0.94,0.94,571466''')
df = pd.read_csv(fh, dtype={'Time':object})
df['Timestamp'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
print df
输出:
Date Time Open High Low Close Volume Timestamp
0 12/30/2002 0930 0.94 0.94 0.94 0.94 571466 2002-12-30 09:30:00
您可以通过指定日期格式告诉 pandas 那里没有分隔符。 %H%M
告诉 python 你有一个没有分隔线的时间。例如,如果您有 :
的分隔符,那么您将使用 format='%H:%M'
.
假设您已经加载了所有内容并且您的数据框加载为 df
。
from pandas import pandas
# file loading and such
asset['Date'] = pandas.to_datetime(asset['Date'])
asset['Time'] = pandas.DatetimeIndex(pandas.to_datetime(asset['Time'], format = '%H%M')).time
会给你
Date Time Open High Low Close Volume
0 2002-12-30 09:30:00 0.94 0.94 0.94 0.94 571466
Python3 人:
df['Time'] = pd.to_datetime(df['Time'], format='%H%M').dt.time
会给你
Date Time Open High Low Close Volume
0 12/30/2002 09:30:00 0.94 0.94 0.94 0.94 571466
您可以通过将列表列表传递给 parse_dates
参数来传递要解析为完整 datetime
的列列表:
In [6]:
import io
import pandas as pd
t='''"Date","Time","Open","High","Low","Close","Volume"
12/30/2002,0930,0.94,0.94,0.94,0.94,571466'''
df = pd.read_csv(io.StringIO(t), parse_dates=[['Date','Time']], keep_date_col=True)
df
Out[6]:
Date_Time Date Time Open High Low Close Volume
0 2002-12-30 09:30:00 12/30/2002 0930 0.94 0.94 0.94 0.94 571466
您可以看到 dtypes
符合预期:
In [7]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 8 columns):
Date_Time 1 non-null datetime64[ns]
Date 1 non-null object
Time 1 non-null object
Open 1 non-null float64
High 1 non-null float64
Low 1 non-null float64
Close 1 non-null float64
Volume 1 non-null int64
dtypes: datetime64[ns](1), float64(4), int64(1), object(2)
memory usage: 144.0+ bytes