根据日期范围重复数据框行
Repeating rows of dataframe based on date range
我有一个这样的数据框:
Col1 Col2 Col3 StartDate EndDate Qty
24HR A1 B1 1/1/2020 2/28/2020 4.2
asd A2 B2 5/1/2020 8/31/2020 35
我想根据 StartDate
和 EndDate
之间的月数重复数据帧的每一行,以获得以下排序:
Col1 Col2 Col3 StartDate EndDate Qty
24HR A1 B1 1/1/2020 1/31/2020 4.2
24HR A1 B1 2/1/2020 2/28/2020 4.2
asd A2 B2 5/1/2020 5/31/2020 35
asd A2 B2 6/1/2020 6/30/2020 35
asd A2 B2 7/1/2020 7/31/2020 35
asd A2 B2 8/1/2020 8/31/2020 35
我不确定如何解决这个问题。
让我们试试explode
df.StartDate=df.apply(lambda x : pd.date_range(start=x['StartDate'], end=x['EndDate'], freq='MS') ,axis=1)
df=df.explode('StartDate')
df.EndDate=df.StartDate+pd.tseries.offsets.MonthEnd()
df
Col1 Col2 Col3 StartDate EndDate Qty
0 24HR A1 B1 2020-01-01 2020-01-31 4.2
0 24HR A1 B1 2020-02-01 2020-02-29 4.2
1 asd A2 B2 2020-05-01 2020-05-31 35.0
1 asd A2 B2 2020-06-01 2020-06-30 35.0
1 asd A2 B2 2020-07-01 2020-07-31 35.0
1 asd A2 B2 2020-08-01 2020-08-31 35.0
#Coerce the dates to datetime
df.EndDate=pd.to_datetime(df.EndDate)
df.StartDate=pd.to_datetime(df.StartDate)
#Find the time delta in months
df['t']=((df.EndDate.sub(df.StartDate))/ np.timedelta64(1, 'M')).round(0).astype(int)
#Use `np.repeat`
pd.DataFrame(np.repeat(df.values,df.t,axis=0),columns=df.columns).drop(columns=['t'])
你可以试试这个...
list=[]
for i in range(df.shape[0]):
list.append(int(df['EndDate'][i][0])-int(df['StartDate'][i][0])+1)
df= df.reindex(df.index.repeat(pd.Series(list)))
print(df)
# output
Col1 Col2 Col3 StartDate EndDate Qty
0 24HR A1 B1 1/1/2020 2/28/2020 4.2
0 24HR A1 B1 1/1/2020 2/28/2020 4.2
1 asd A2 B2 5/1/2020 8/31/2020 35.0
1 asd A2 B2 5/1/2020 8/31/2020 35.0
1 asd A2 B2 5/1/2020 8/31/2020 35.0
1 asd A2 B2 5/1/2020 8/31/2020 35.0
编辑 - 此代码的简短版本
df= df.reindex(df.index.repeat(pd.Series(df.EndDate.str[0].astype(int)+1-df.StartDate.str[0].astype(int))))
我有一个这样的数据框:
Col1 Col2 Col3 StartDate EndDate Qty
24HR A1 B1 1/1/2020 2/28/2020 4.2
asd A2 B2 5/1/2020 8/31/2020 35
我想根据 StartDate
和 EndDate
之间的月数重复数据帧的每一行,以获得以下排序:
Col1 Col2 Col3 StartDate EndDate Qty
24HR A1 B1 1/1/2020 1/31/2020 4.2
24HR A1 B1 2/1/2020 2/28/2020 4.2
asd A2 B2 5/1/2020 5/31/2020 35
asd A2 B2 6/1/2020 6/30/2020 35
asd A2 B2 7/1/2020 7/31/2020 35
asd A2 B2 8/1/2020 8/31/2020 35
我不确定如何解决这个问题。
让我们试试explode
df.StartDate=df.apply(lambda x : pd.date_range(start=x['StartDate'], end=x['EndDate'], freq='MS') ,axis=1)
df=df.explode('StartDate')
df.EndDate=df.StartDate+pd.tseries.offsets.MonthEnd()
df
Col1 Col2 Col3 StartDate EndDate Qty
0 24HR A1 B1 2020-01-01 2020-01-31 4.2
0 24HR A1 B1 2020-02-01 2020-02-29 4.2
1 asd A2 B2 2020-05-01 2020-05-31 35.0
1 asd A2 B2 2020-06-01 2020-06-30 35.0
1 asd A2 B2 2020-07-01 2020-07-31 35.0
1 asd A2 B2 2020-08-01 2020-08-31 35.0
#Coerce the dates to datetime
df.EndDate=pd.to_datetime(df.EndDate)
df.StartDate=pd.to_datetime(df.StartDate)
#Find the time delta in months
df['t']=((df.EndDate.sub(df.StartDate))/ np.timedelta64(1, 'M')).round(0).astype(int)
#Use `np.repeat`
pd.DataFrame(np.repeat(df.values,df.t,axis=0),columns=df.columns).drop(columns=['t'])
你可以试试这个...
list=[]
for i in range(df.shape[0]):
list.append(int(df['EndDate'][i][0])-int(df['StartDate'][i][0])+1)
df= df.reindex(df.index.repeat(pd.Series(list)))
print(df)
# output
Col1 Col2 Col3 StartDate EndDate Qty
0 24HR A1 B1 1/1/2020 2/28/2020 4.2
0 24HR A1 B1 1/1/2020 2/28/2020 4.2
1 asd A2 B2 5/1/2020 8/31/2020 35.0
1 asd A2 B2 5/1/2020 8/31/2020 35.0
1 asd A2 B2 5/1/2020 8/31/2020 35.0
1 asd A2 B2 5/1/2020 8/31/2020 35.0
编辑 - 此代码的简短版本
df= df.reindex(df.index.repeat(pd.Series(df.EndDate.str[0].astype(int)+1-df.StartDate.str[0].astype(int))))