如何重新采样 csv 以便它可以与我的其他 csv 完美对齐
How to resample a csv so it can perfectly align with my other csv
我正在处理项目的另一个方面,但我使用从 Mesowest 下载的 csv 文件进行实验。
新密码是
df_pirates_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime'],
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))
df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
c = df_pirates.groupby(level=0).transform("count")
c[c["date"]>1].index.tolist()
df_pirates = df_pirates[~df_pirates.index.isin(c[c["date"]>1].index.tolist())].resample('10Min').fillna("nearest",limit=1)
wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"
wx=pd.read_csv(wxdataadd)
wxdata=wx.resample('10Min')
temp=wxdata['air_temp_set_1']
time=wxdata['Date_Time']
wxtest=pd.concat([df_pirates, temp,time])
wxtest=wxtest.set_index([df_pirates,temp]).unstack()
print (wxtest)
但是我得到了
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
关于天气数据的重采样,因为 wxdata 在列中有值:
Index(['Date_Time', 'air_temp_set_1', 'relative_humidity_set_1',
'wind_speed_set_1', 'wind_direction_set_1', 'wind_gust_set_1',
'precip_accum_since_local_midnight_set_1'],
dtype='object')
这是每 5 分钟一次,当我想将它设置为每 10 分钟甚至 15 分钟以与骑行数据很好地对齐时。
这是wx的前几行:
不过我知道wx里面的时间和盗版的不一致
- 出现此问题是因为
.resample
需要日期时间索引
.resample
最后还需要某种类型的聚合函数,例如 .sum()
或 .mean()
.
- 对于此数据,由于您通过从 5 分钟到 10 分钟的间隔重新采样来降低分辨率,因此使用
.mean()
作为聚合函数更有意义
- 可能还有其他问题,但这是导致
TypeError
。
import pandas as pd
wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"
# convert Date_Time to datetime when reading file
wx = pd.read_csv(wxdataadd, parse_dates=['Date_Time'])
# set Date_Time to the index
wx.set_index('Date_Time', inplace=True)
# resample the weather data from 5 minutes to 10 minutes, using mean as the agg function
wx = wx.resample('10Min').mean() # there needs to be an agg function
# join the air temp data to df_pirates on the datetime index
wxtest = df_pirates.join(wx.air_temp_set_1)
# this is an example of the wxtest dataframe
# display(wxtest.head())
SPOSTMIN ride open air_temp_set_1
2017-01-01 09:10:00 5.0 pirates True 51.539172
2017-01-01 09:20:00 5.0 pirates True 42.690527
2017-01-01 09:30:00 5.0 pirates True 46.879581
2017-01-01 09:40:00 5.0 pirates True 41.588848
2017-01-01 09:50:00 5.0 pirates True 45.575068
示例数据
import pandas as pd
import numpy as np
from datetime import datetime
# sample weather data in 5 minutes intervals
np.random.seed(45)
wxdata = {'Date_Time': pd.bdate_range(datetime(2017, 1, 1), freq='5Min', periods=10), 'air_temp_set_1': np.random.uniform(40, 55, size=(10,))}
wx = pd.DataFrame(wxdata)
wx.Date_Time = wx.Date_Time + pd.Timedelta(hours=9, minutes=10)
# display(wx.head())
air_temp_set_1
Date_Time
2017-01-01 09:10:00 54.835173
2017-01-01 09:15:00 48.243171
2017-01-01 09:20:00 44.221710
2017-01-01 09:25:00 41.159343
2017-01-01 09:30:00 46.667042
# pirates data in 10 minutes intervals
pirates = {pd.Timestamp('2017-01-01 09:10:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:20:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:30:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:40:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:50:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True}}
df_pirates = pd.DataFrame.from_dict(pirates, orient='index')
# display(df_pirates)
SPOSTMIN ride open
2017-01-01 09:10:00 5.0 pirates True
2017-01-01 09:20:00 5.0 pirates True
2017-01-01 09:30:00 5.0 pirates True
2017-01-01 09:40:00 5.0 pirates True
2017-01-01 09:50:00 5.0 pirates True
我正在处理项目的另一个方面,但我使用从 Mesowest 下载的 csv 文件进行实验。 新密码是
df_pirates_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime'],
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))
df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
c = df_pirates.groupby(level=0).transform("count")
c[c["date"]>1].index.tolist()
df_pirates = df_pirates[~df_pirates.index.isin(c[c["date"]>1].index.tolist())].resample('10Min').fillna("nearest",limit=1)
wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"
wx=pd.read_csv(wxdataadd)
wxdata=wx.resample('10Min')
temp=wxdata['air_temp_set_1']
time=wxdata['Date_Time']
wxtest=pd.concat([df_pirates, temp,time])
wxtest=wxtest.set_index([df_pirates,temp]).unstack()
print (wxtest)
但是我得到了
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
关于天气数据的重采样,因为 wxdata 在列中有值:
Index(['Date_Time', 'air_temp_set_1', 'relative_humidity_set_1',
'wind_speed_set_1', 'wind_direction_set_1', 'wind_gust_set_1',
'precip_accum_since_local_midnight_set_1'],
dtype='object')
这是每 5 分钟一次,当我想将它设置为每 10 分钟甚至 15 分钟以与骑行数据很好地对齐时。
这是wx的前几行:
- 出现此问题是因为
.resample
需要日期时间索引.resample
最后还需要某种类型的聚合函数,例如.sum()
或.mean()
.- 对于此数据,由于您通过从 5 分钟到 10 分钟的间隔重新采样来降低分辨率,因此使用
.mean()
作为聚合函数更有意义
- 可能还有其他问题,但这是导致
TypeError
。
import pandas as pd
wxdataadd="C:/Users/stratus/Downloads/DisneyJanuary2017Wx.csv"
# convert Date_Time to datetime when reading file
wx = pd.read_csv(wxdataadd, parse_dates=['Date_Time'])
# set Date_Time to the index
wx.set_index('Date_Time', inplace=True)
# resample the weather data from 5 minutes to 10 minutes, using mean as the agg function
wx = wx.resample('10Min').mean() # there needs to be an agg function
# join the air temp data to df_pirates on the datetime index
wxtest = df_pirates.join(wx.air_temp_set_1)
# this is an example of the wxtest dataframe
# display(wxtest.head())
SPOSTMIN ride open air_temp_set_1
2017-01-01 09:10:00 5.0 pirates True 51.539172
2017-01-01 09:20:00 5.0 pirates True 42.690527
2017-01-01 09:30:00 5.0 pirates True 46.879581
2017-01-01 09:40:00 5.0 pirates True 41.588848
2017-01-01 09:50:00 5.0 pirates True 45.575068
示例数据
import pandas as pd
import numpy as np
from datetime import datetime
# sample weather data in 5 minutes intervals
np.random.seed(45)
wxdata = {'Date_Time': pd.bdate_range(datetime(2017, 1, 1), freq='5Min', periods=10), 'air_temp_set_1': np.random.uniform(40, 55, size=(10,))}
wx = pd.DataFrame(wxdata)
wx.Date_Time = wx.Date_Time + pd.Timedelta(hours=9, minutes=10)
# display(wx.head())
air_temp_set_1
Date_Time
2017-01-01 09:10:00 54.835173
2017-01-01 09:15:00 48.243171
2017-01-01 09:20:00 44.221710
2017-01-01 09:25:00 41.159343
2017-01-01 09:30:00 46.667042
# pirates data in 10 minutes intervals
pirates = {pd.Timestamp('2017-01-01 09:10:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:20:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:30:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:40:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True},
pd.Timestamp('2017-01-01 09:50:00', freq='10T'): {'SPOSTMIN': 5.0, 'ride': 'pirates', 'open': True}}
df_pirates = pd.DataFrame.from_dict(pirates, orient='index')
# display(df_pirates)
SPOSTMIN ride open
2017-01-01 09:10:00 5.0 pirates True
2017-01-01 09:20:00 5.0 pirates True
2017-01-01 09:30:00 5.0 pirates True
2017-01-01 09:40:00 5.0 pirates True
2017-01-01 09:50:00 5.0 pirates True