如何从索引中删除数据点
How to drop datapoints from index
我是 Python 的新手,我有一个包含日期的数据集 S2
。当我使用命令时:
available_datapoints = S2.index,
然后
print(available_datapoints)
产量:
<class 'pandas.tseries.index.DatetimeIndex'>
[2017-05-07 00:00:00+00:00, ..., 2017-07-27 23:50:00+00:00]
Length: 11808, Freq: 10T, Timezone: UTC stop
但是,我想开始 2017-11-07 00:00:00+00:00
而不是 2017-05-07 00:00:00+00:00
,而不是 2017-07-27 23:50:00+00:00
,我想停止 2017-07-22 23:50:00+00:00
。
有人知道我怎么改吗?
我想你可以使用 DataFrame.truncate
:
#Sample data
S2 = pd.DataFrame({'a': range(11808)},
index=pd.date_range(start='2017-05-07',periods=11808, freq='10T'))
print (S2.head())
a
2017-05-07 00:00:00 0
2017-05-07 00:10:00 1
2017-05-07 00:20:00 2
2017-05-07 00:30:00 3
2017-05-07 00:40:00 4
print (S2.tail())
a
2017-07-27 23:10:00 11803
2017-07-27 23:20:00 11804
2017-07-27 23:30:00 11805
2017-07-27 23:40:00 11806
2017-07-27 23:50:00 11807
S2 = S2.truncate(before='2017-07-11', after='2017-07-22 23:50:00')
print (S2.head())
a
2017-07-11 00:00:00 9360
2017-07-11 00:10:00 9361
2017-07-11 00:20:00 9362
2017-07-11 00:30:00 9363
2017-07-11 00:40:00 9364
print (S2.tail())
a
2017-07-22 23:10:00 11083
2017-07-22 23:20:00 11084
2017-07-22 23:30:00 11085
2017-07-22 23:40:00 11086
2017-07-22 23:50:00 11087
假设您真的想从“2017-07-11”开始而不是“2017-11-07”(在您的结束日期“2017-07-23”之后),您可以使用 Partial String Indexing:
设置
df = pd.DataFrame(index = pd.date_range('2017-05-07 00:00:00+00:00','2017-07-27 23:50:00+00:00', freq='10T'))
print(df.index)
DatetimeIndex(['2017-05-07 00:00:00+00:00', '2017-05-07 00:10:00+00:00',
'2017-05-07 00:20:00+00:00', '2017-05-07 00:30:00+00:00',
'2017-05-07 00:40:00+00:00', '2017-05-07 00:50:00+00:00',
'2017-05-07 01:00:00+00:00', '2017-05-07 01:10:00+00:00',
'2017-05-07 01:20:00+00:00', '2017-05-07 01:30:00+00:00',
...
'2017-07-27 22:20:00+00:00', '2017-07-27 22:30:00+00:00',
'2017-07-27 22:40:00+00:00', '2017-07-27 22:50:00+00:00',
'2017-07-27 23:00:00+00:00', '2017-07-27 23:10:00+00:00',
'2017-07-27 23:20:00+00:00', '2017-07-27 23:30:00+00:00',
'2017-07-27 23:40:00+00:00', '2017-07-27 23:50:00+00:00'],
dtype='datetime64[ns, UTC]', length=11808, freq='10T')
现在,使用带切片的部分字符串索引:
df1 = df['2017-07-11':'2017-07-22 23:50:00']
print(df_1.index)
输出:一个较小的数据帧,时间在 2017-07-11 之前和 2017-07-22 之后 23:50 丢弃:
DatetimeIndex(['2017-07-11 00:00:00+00:00', '2017-07-11 00:10:00+00:00',
'2017-07-11 00:20:00+00:00', '2017-07-11 00:30:00+00:00',
'2017-07-11 00:40:00+00:00', '2017-07-11 00:50:00+00:00',
'2017-07-11 01:00:00+00:00', '2017-07-11 01:10:00+00:00',
'2017-07-11 01:20:00+00:00', '2017-07-11 01:30:00+00:00',
...
'2017-07-22 22:20:00+00:00', '2017-07-22 22:30:00+00:00',
'2017-07-22 22:40:00+00:00', '2017-07-22 22:50:00+00:00',
'2017-07-22 23:00:00+00:00', '2017-07-22 23:10:00+00:00',
'2017-07-22 23:20:00+00:00', '2017-07-22 23:30:00+00:00',
'2017-07-22 23:40:00+00:00', '2017-07-22 23:50:00+00:00'],
dtype='datetime64[ns, UTC]', length=1728, freq='10T')
我是 Python 的新手,我有一个包含日期的数据集 S2
。当我使用命令时:
available_datapoints = S2.index,
然后
print(available_datapoints)
产量:
<class 'pandas.tseries.index.DatetimeIndex'>
[2017-05-07 00:00:00+00:00, ..., 2017-07-27 23:50:00+00:00]
Length: 11808, Freq: 10T, Timezone: UTC stop
但是,我想开始 2017-11-07 00:00:00+00:00
而不是 2017-05-07 00:00:00+00:00
,而不是 2017-07-27 23:50:00+00:00
,我想停止 2017-07-22 23:50:00+00:00
。
有人知道我怎么改吗?
我想你可以使用 DataFrame.truncate
:
#Sample data
S2 = pd.DataFrame({'a': range(11808)},
index=pd.date_range(start='2017-05-07',periods=11808, freq='10T'))
print (S2.head())
a
2017-05-07 00:00:00 0
2017-05-07 00:10:00 1
2017-05-07 00:20:00 2
2017-05-07 00:30:00 3
2017-05-07 00:40:00 4
print (S2.tail())
a
2017-07-27 23:10:00 11803
2017-07-27 23:20:00 11804
2017-07-27 23:30:00 11805
2017-07-27 23:40:00 11806
2017-07-27 23:50:00 11807
S2 = S2.truncate(before='2017-07-11', after='2017-07-22 23:50:00')
print (S2.head())
a
2017-07-11 00:00:00 9360
2017-07-11 00:10:00 9361
2017-07-11 00:20:00 9362
2017-07-11 00:30:00 9363
2017-07-11 00:40:00 9364
print (S2.tail())
a
2017-07-22 23:10:00 11083
2017-07-22 23:20:00 11084
2017-07-22 23:30:00 11085
2017-07-22 23:40:00 11086
2017-07-22 23:50:00 11087
假设您真的想从“2017-07-11”开始而不是“2017-11-07”(在您的结束日期“2017-07-23”之后),您可以使用 Partial String Indexing:
设置
df = pd.DataFrame(index = pd.date_range('2017-05-07 00:00:00+00:00','2017-07-27 23:50:00+00:00', freq='10T'))
print(df.index)
DatetimeIndex(['2017-05-07 00:00:00+00:00', '2017-05-07 00:10:00+00:00',
'2017-05-07 00:20:00+00:00', '2017-05-07 00:30:00+00:00',
'2017-05-07 00:40:00+00:00', '2017-05-07 00:50:00+00:00',
'2017-05-07 01:00:00+00:00', '2017-05-07 01:10:00+00:00',
'2017-05-07 01:20:00+00:00', '2017-05-07 01:30:00+00:00',
...
'2017-07-27 22:20:00+00:00', '2017-07-27 22:30:00+00:00',
'2017-07-27 22:40:00+00:00', '2017-07-27 22:50:00+00:00',
'2017-07-27 23:00:00+00:00', '2017-07-27 23:10:00+00:00',
'2017-07-27 23:20:00+00:00', '2017-07-27 23:30:00+00:00',
'2017-07-27 23:40:00+00:00', '2017-07-27 23:50:00+00:00'],
dtype='datetime64[ns, UTC]', length=11808, freq='10T')
现在,使用带切片的部分字符串索引:
df1 = df['2017-07-11':'2017-07-22 23:50:00']
print(df_1.index)
输出:一个较小的数据帧,时间在 2017-07-11 之前和 2017-07-22 之后 23:50 丢弃:
DatetimeIndex(['2017-07-11 00:00:00+00:00', '2017-07-11 00:10:00+00:00',
'2017-07-11 00:20:00+00:00', '2017-07-11 00:30:00+00:00',
'2017-07-11 00:40:00+00:00', '2017-07-11 00:50:00+00:00',
'2017-07-11 01:00:00+00:00', '2017-07-11 01:10:00+00:00',
'2017-07-11 01:20:00+00:00', '2017-07-11 01:30:00+00:00',
...
'2017-07-22 22:20:00+00:00', '2017-07-22 22:30:00+00:00',
'2017-07-22 22:40:00+00:00', '2017-07-22 22:50:00+00:00',
'2017-07-22 23:00:00+00:00', '2017-07-22 23:10:00+00:00',
'2017-07-22 23:20:00+00:00', '2017-07-22 23:30:00+00:00',
'2017-07-22 23:40:00+00:00', '2017-07-22 23:50:00+00:00'],
dtype='datetime64[ns, UTC]', length=1728, freq='10T')