处理星期几
Manipulate day-of-week number
我有带日期的数据,但我想在时间方面做更多的事情。我创建了一个函数,可以在我选择时操纵周开始,即 0 表示星期三而不是星期日。它还将每月标签添加到我的数据框中:
def date_manipulate(df,startday):
df['Month']=df.index.strftime("%B")
df['DOW']=df.index.strftime("%A")
week = {}
default_week =['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
temp_week = default_week[startday:] + default_week[0:startday]
for index, day in enumerate(temp_week):
week[day] = index
df.replace({"DOW":week},inplace=True)
return df
然后我使用 groupby 对年和周进行汇总。
def data_agg(df,name):
df_monthly=df.groupby([(df['name']),(df.index.year),(df['Month']),(df.index.week),(df['DOW']),(df.index)],sort=True)
df_monthly=cal_columns(df_monthly)
df_monthly.index.names=['Name','Year','Month','Week','Day of Week','date']
df_monthly.to_csv('data/{}_Aggregate.csv'.format(name))
这很好,除了一周不考虑星期三到星期二现在是 7 天的一周而不是星期天到星期六。我想解决这个问题会做一个从 0 到 6 的 7 天循环。但这会产生一个不同的问题,如果数据没有经过整周,即只有星期三、星期四、星期五和下一个星期三数据丢失,则没有关于本周结束或下周开始的明确标识符。我觉得我现在处于逻辑沉没状态。真的需要一些了解,谢谢。
因此,我要说明的示例与此类似
Week day of the week randdata
1 Wednesday 1
1 Thursday 3
1 Friday 4
2 Wednesday 1
2 Saturday 5
2 Sunday 6
3 Thursday 6
3 Friday 7
在编辑的过程中,我突然灵光一现。
从星期三开始,按日期开始连续计算天数,如果日期之间有差距,则开始新的一周,否则下周三为新一周的开始。
I created a function that would manipulate the week to start whenever I choose, ie 0 would be Wednesday instead of Sunday.
您可以使用 DatetimeIndex.dayofweek
(或 Series.dt.dayofweek
)更有效地执行此操作,在调用此 属性 之前应用偏移量:
The day of the week with Monday=0, Sunday=6
>>> import pandas as pd
>>> days = ['Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday']
>>> def custom_dayofweek(ser, startday='Monday'):
# Use enumerate/reversed dict if you really want
# to optimize speed
offset = days.index(startday)
if isinstance(ser, pd.Series):
# otherwise: assume DatetimeIndex
ser = ser.dt
return (ser - offset).dayofweek
>>> rng = pd.date_range('1/1/2011', periods=72)
>>> custom_dayofweek(rng, startday='Sunday')
Int64Index([6, 0, 1, 2, 3, 4, 5, 6, 0, 1,
...
5, 6, 0, 1, 2, 3, 4, 5, 6, 0],
dtype='int64', length=72)
您可以测试是否正确排列:
>>> rng.weekday_name
Index(['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday', 'Monday',
...
'Friday', 'Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday'],
dtype='object', length=72)
>>> custom_dayofweek(rng, 'Wednesday')
Int64Index([3, 4, 5, 6, 0, 1, 2, 3, 4, 5,
...
2, 3, 4, 5, 6, 0, 1, 2, 3, 4],
dtype='int64', length=72)
如上面的评论所述,我更关心将正确的周数应用于数据。我能够使用以下代码解决它。逻辑如下。
1)首先将日期索引转换为序号
2)使用序数来计算一周的开始和结束日期,知道 7 天总是固定的。
3) 将周数分配给数据
offset=list(week.values())[df['DOW'][0]]
startdate=df['temp_date'][0]-(offset)
enddate=startdate+6
week=1
df['week']=0
for counter, day in enumerate (df['temp_date']):
#if df.loc[counter, 'merchant_name']==current_merchant:
if df['merchant_name'][counter]==current_merchant:
if(df['temp_date'][counter])<=enddate:
df['week'][counter]=week
else:
enddate+=7
week+=1
df['week'][counter]=week
else:
current_merchant=df['merchant_name'][counter]
startdate=df['temp_date'][counter]-(offset)
enddate=startdate+6
week=1
df['week'][counter]=week
我有带日期的数据,但我想在时间方面做更多的事情。我创建了一个函数,可以在我选择时操纵周开始,即 0 表示星期三而不是星期日。它还将每月标签添加到我的数据框中:
def date_manipulate(df,startday):
df['Month']=df.index.strftime("%B")
df['DOW']=df.index.strftime("%A")
week = {}
default_week =['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
temp_week = default_week[startday:] + default_week[0:startday]
for index, day in enumerate(temp_week):
week[day] = index
df.replace({"DOW":week},inplace=True)
return df
然后我使用 groupby 对年和周进行汇总。
def data_agg(df,name):
df_monthly=df.groupby([(df['name']),(df.index.year),(df['Month']),(df.index.week),(df['DOW']),(df.index)],sort=True)
df_monthly=cal_columns(df_monthly)
df_monthly.index.names=['Name','Year','Month','Week','Day of Week','date']
df_monthly.to_csv('data/{}_Aggregate.csv'.format(name))
这很好,除了一周不考虑星期三到星期二现在是 7 天的一周而不是星期天到星期六。我想解决这个问题会做一个从 0 到 6 的 7 天循环。但这会产生一个不同的问题,如果数据没有经过整周,即只有星期三、星期四、星期五和下一个星期三数据丢失,则没有关于本周结束或下周开始的明确标识符。我觉得我现在处于逻辑沉没状态。真的需要一些了解,谢谢。
因此,我要说明的示例与此类似
Week day of the week randdata
1 Wednesday 1
1 Thursday 3
1 Friday 4
2 Wednesday 1
2 Saturday 5
2 Sunday 6
3 Thursday 6
3 Friday 7
在编辑的过程中,我突然灵光一现。
从星期三开始,按日期开始连续计算天数,如果日期之间有差距,则开始新的一周,否则下周三为新一周的开始。
I created a function that would manipulate the week to start whenever I choose, ie 0 would be Wednesday instead of Sunday.
您可以使用 DatetimeIndex.dayofweek
(或 Series.dt.dayofweek
)更有效地执行此操作,在调用此 属性 之前应用偏移量:
The day of the week with Monday=0, Sunday=6
>>> import pandas as pd
>>> days = ['Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday']
>>> def custom_dayofweek(ser, startday='Monday'):
# Use enumerate/reversed dict if you really want
# to optimize speed
offset = days.index(startday)
if isinstance(ser, pd.Series):
# otherwise: assume DatetimeIndex
ser = ser.dt
return (ser - offset).dayofweek
>>> rng = pd.date_range('1/1/2011', periods=72)
>>> custom_dayofweek(rng, startday='Sunday')
Int64Index([6, 0, 1, 2, 3, 4, 5, 6, 0, 1,
...
5, 6, 0, 1, 2, 3, 4, 5, 6, 0],
dtype='int64', length=72)
您可以测试是否正确排列:
>>> rng.weekday_name
Index(['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday', 'Monday',
...
'Friday', 'Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday'],
dtype='object', length=72)
>>> custom_dayofweek(rng, 'Wednesday')
Int64Index([3, 4, 5, 6, 0, 1, 2, 3, 4, 5,
...
2, 3, 4, 5, 6, 0, 1, 2, 3, 4],
dtype='int64', length=72)
如上面的评论所述,我更关心将正确的周数应用于数据。我能够使用以下代码解决它。逻辑如下。 1)首先将日期索引转换为序号 2)使用序数来计算一周的开始和结束日期,知道 7 天总是固定的。 3) 将周数分配给数据
offset=list(week.values())[df['DOW'][0]]
startdate=df['temp_date'][0]-(offset)
enddate=startdate+6
week=1
df['week']=0
for counter, day in enumerate (df['temp_date']):
#if df.loc[counter, 'merchant_name']==current_merchant:
if df['merchant_name'][counter]==current_merchant:
if(df['temp_date'][counter])<=enddate:
df['week'][counter]=week
else:
enddate+=7
week+=1
df['week'][counter]=week
else:
current_merchant=df['merchant_name'][counter]
startdate=df['temp_date'][counter]-(offset)
enddate=startdate+6
week=1
df['week'][counter]=week