如何读取pandas.read_csv包含时间AM/PM格式的数据?
How reading data with pandas.read_csv containing time AM/PM format?
我有如下一组数据:
Temp Hi Low Out Dew Wind Wind Wind Hi Hi Wind Heat THW THSW Rain Solar Solar Hi Solar Heat Cool In In Wind Wind ISS Arc.
Date Time Out Temp Temp Hum Pt. Speed Dir Run Speed Dir Chill Index Index Index Bar Rain Rate Rad. Energy Rad. D-D D-D Temp Hum ET Samp Tx Recept Int.
01/01/15 12:30 a 17.0 17.6 17.0 14 -10.7 30.6 N 15.29 51.5 N 15.7 14.1 10.8 8.3 741.4 0.00 0.0 0 0.00 0 0.028 0.000 26.2 2 0.00 702 1 100.0 30
01/01/15 1:00 a 16.6 17.0 16.6 14 -11.1 27.4 N 13.68 45.1 N 15.3 13.7 10.7 8.1 741.8 0.00 0.0 0 0.00 0 0.037 0.000 25.6 2 0.25 702 1 100.0 30
01/01/15 1:30 a 16.2 16.6 16.1 14 -11.4 24.1 N 12.07 35.4 N 15.0 13.4 10.7 7.9 741.9 0.00 0.0 0 0.00 0 0.044 0.000 25.1 2 0.00 703 1 100.0 30
01/01/15 2:00 a 15.6 16.2 15.6 14 -11.9 17.7 N 8.85 33.8 N 14.6 12.8 11.0 7.8 742.4 0.00 0.0 0 0.00 0 0.057 0.000 24.6 2 0.20 702 1 100.0 30
01/01/15 2:30 a 15.3 15.8 15.3 14 -12.1 16.1 N 8.05 29.0 N 14.4 12.6 11.0 7.7 742.8 0.00 0.0 0 0.00 0 0.063 0.000 24.2 2 0.00 703 1 100.0 30
01/01/15 3:00 a 14.8 15.3 14.8 15 -11.6 20.9 N 10.46 38.6 N 13.4 12.3 10.0 6.9 742.8 0.00 0.0 0 0.00 0 0.073 0.000 23.6 2 0.18 702 1 100.0 30
我正在尝试阅读它,但我遇到了时间格式问题,如您所见,我有 12:00 a,1:00 a ...
我用它来读取另一个类似的文件(没有am/pm格式),如下:
data = pd.read_csv(filename, skiprows=2,sep='\s+', header=None,
index_col=[0,1,2],dayfirst=True, parse_dates=True,
infer_datetime_format=True)
我在想我可以使用 date_parser='%D/%M/%Y %I:%M'
而不是 infer_datetime_format=True
但没有成功。
有什么想法吗?
我试过了,我认为可行,但有一种方法可以直接从 read_csv?
读取它
data = pd.read_csv(path+filename, skiprows=2,sep='\s+', header=None,
names=['date','hour','ap','a','b','c','d','e','f','g','h','i','j',
'k','l','m','n','o','p','q','r','s','t',
'u','v','w','x','y','z','aa','bb','cc'])
因此我创建了一个时间数组:
time = pd.to_datetime(data['date'] + ' ' + data['hour']+data['ap'])
和
data.index = time
您可以将 parse_dates
用于:
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call
result ‘foo’
和您的自定义 date_parser
函数
代码:
def parse_dt(dt, tm, ap):
return pd.to_datetime(dt + ' ' + tm + ap, dayfirst=True)
df = pd.read_csv(filename, sep='\s+', skiprows=2, header=None,
parse_dates={'ts': [0,1,2] }, date_parser=parse_dt)
输出:
In [44]: df
Out[44]:
ts 3 4 5 6 7 8 9 10 11 ... \
0 2015-01-01 00:30:00 17.0 17.6 17.0 14 -10.7 30.6 N 15.29 51.5 ...
1 2015-01-01 01:00:00 16.6 17.0 16.6 14 -11.1 27.4 N 13.68 45.1 ...
2 2015-01-01 01:30:00 16.2 16.6 16.1 14 -11.4 24.1 N 12.07 35.4 ...
3 2015-01-01 02:00:00 15.6 16.2 15.6 14 -11.9 17.7 N 8.85 33.8 ...
4 2015-01-01 02:30:00 15.3 15.8 15.3 14 -12.1 16.1 N 8.05 29.0 ...
5 2015-01-01 03:00:00 14.8 15.3 14.8 15 -11.6 20.9 N 10.46 38.6 ...
22 23 24 25 26 27 28 29 30 31
0 0 0.028 0.0 26.2 2 0.00 702 1 100.0 30
1 0 0.037 0.0 25.6 2 0.25 702 1 100.0 30
2 0 0.044 0.0 25.1 2 0.00 703 1 100.0 30
3 0 0.057 0.0 24.6 2 0.20 702 1 100.0 30
4 0 0.063 0.0 24.2 2 0.00 703 1 100.0 30
5 0 0.073 0.0 23.6 2 0.18 702 1 100.0 30
[6 rows x 30 columns]
我有如下一组数据:
Temp Hi Low Out Dew Wind Wind Wind Hi Hi Wind Heat THW THSW Rain Solar Solar Hi Solar Heat Cool In In Wind Wind ISS Arc.
Date Time Out Temp Temp Hum Pt. Speed Dir Run Speed Dir Chill Index Index Index Bar Rain Rate Rad. Energy Rad. D-D D-D Temp Hum ET Samp Tx Recept Int.
01/01/15 12:30 a 17.0 17.6 17.0 14 -10.7 30.6 N 15.29 51.5 N 15.7 14.1 10.8 8.3 741.4 0.00 0.0 0 0.00 0 0.028 0.000 26.2 2 0.00 702 1 100.0 30
01/01/15 1:00 a 16.6 17.0 16.6 14 -11.1 27.4 N 13.68 45.1 N 15.3 13.7 10.7 8.1 741.8 0.00 0.0 0 0.00 0 0.037 0.000 25.6 2 0.25 702 1 100.0 30
01/01/15 1:30 a 16.2 16.6 16.1 14 -11.4 24.1 N 12.07 35.4 N 15.0 13.4 10.7 7.9 741.9 0.00 0.0 0 0.00 0 0.044 0.000 25.1 2 0.00 703 1 100.0 30
01/01/15 2:00 a 15.6 16.2 15.6 14 -11.9 17.7 N 8.85 33.8 N 14.6 12.8 11.0 7.8 742.4 0.00 0.0 0 0.00 0 0.057 0.000 24.6 2 0.20 702 1 100.0 30
01/01/15 2:30 a 15.3 15.8 15.3 14 -12.1 16.1 N 8.05 29.0 N 14.4 12.6 11.0 7.7 742.8 0.00 0.0 0 0.00 0 0.063 0.000 24.2 2 0.00 703 1 100.0 30
01/01/15 3:00 a 14.8 15.3 14.8 15 -11.6 20.9 N 10.46 38.6 N 13.4 12.3 10.0 6.9 742.8 0.00 0.0 0 0.00 0 0.073 0.000 23.6 2 0.18 702 1 100.0 30
我正在尝试阅读它,但我遇到了时间格式问题,如您所见,我有 12:00 a,1:00 a ...
我用它来读取另一个类似的文件(没有am/pm格式),如下:
data = pd.read_csv(filename, skiprows=2,sep='\s+', header=None,
index_col=[0,1,2],dayfirst=True, parse_dates=True,
infer_datetime_format=True)
我在想我可以使用 date_parser='%D/%M/%Y %I:%M'
而不是 infer_datetime_format=True
但没有成功。
有什么想法吗?
我试过了,我认为可行,但有一种方法可以直接从 read_csv?
读取它data = pd.read_csv(path+filename, skiprows=2,sep='\s+', header=None,
names=['date','hour','ap','a','b','c','d','e','f','g','h','i','j',
'k','l','m','n','o','p','q','r','s','t',
'u','v','w','x','y','z','aa','bb','cc'])
因此我创建了一个时间数组:
time = pd.to_datetime(data['date'] + ' ' + data['hour']+data['ap'])
和
data.index = time
您可以将 parse_dates
用于:
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
和您的自定义 date_parser
函数
代码:
def parse_dt(dt, tm, ap):
return pd.to_datetime(dt + ' ' + tm + ap, dayfirst=True)
df = pd.read_csv(filename, sep='\s+', skiprows=2, header=None,
parse_dates={'ts': [0,1,2] }, date_parser=parse_dt)
输出:
In [44]: df
Out[44]:
ts 3 4 5 6 7 8 9 10 11 ... \
0 2015-01-01 00:30:00 17.0 17.6 17.0 14 -10.7 30.6 N 15.29 51.5 ...
1 2015-01-01 01:00:00 16.6 17.0 16.6 14 -11.1 27.4 N 13.68 45.1 ...
2 2015-01-01 01:30:00 16.2 16.6 16.1 14 -11.4 24.1 N 12.07 35.4 ...
3 2015-01-01 02:00:00 15.6 16.2 15.6 14 -11.9 17.7 N 8.85 33.8 ...
4 2015-01-01 02:30:00 15.3 15.8 15.3 14 -12.1 16.1 N 8.05 29.0 ...
5 2015-01-01 03:00:00 14.8 15.3 14.8 15 -11.6 20.9 N 10.46 38.6 ...
22 23 24 25 26 27 28 29 30 31
0 0 0.028 0.0 26.2 2 0.00 702 1 100.0 30
1 0 0.037 0.0 25.6 2 0.25 702 1 100.0 30
2 0 0.044 0.0 25.1 2 0.00 703 1 100.0 30
3 0 0.057 0.0 24.6 2 0.20 702 1 100.0 30
4 0 0.063 0.0 24.2 2 0.00 703 1 100.0 30
5 0 0.073 0.0 23.6 2 0.18 702 1 100.0 30
[6 rows x 30 columns]