系列数据之间有多少天
How many days between series data
我有以下系列:
df
产生
Date
2001-01-03 True
2002-07-24 True
2002-07-29 True
2008-09-30 True
2008-10-13 True
2008-10-28 True
2008-11-13 True
2008-11-21 True
2008-11-24 True
2008-12-16 True
2009-03-10 True
2009-03-23 True
Name: pct_day, dtype: bool
我如何找出真实值之间的天数,不包括周末?
这似乎有效:
import pandas as pd
df = pd.DataFrame({'Date' : pd.date_range(start='2/1/2018', end='2/08/2018', freq='D'),
'Label': 'True'})
df['DayOfWeek'] = df['Date'].dt.day_name()
df = df[(df.DayOfWeek != 'Saturday') & (df.DayOfWeek != 'Sunday') & (df.Label == 'True')]
df['Diff'] = df['Date'].diff()
你可以这样做:
(创建示例 DataFrame 只是为了举例)
>>> import pandas.util.testing as tm
>>> df = tm.makeTimeDataFrame(freq='M', nper=5)
>>> print(df)
A B C D
2000-01-31 1.051346 1.722165 -0.659687 1.026716
2000-02-29 0.352166 1.699898 1.469741 -0.138593
2000-03-31 -0.202217 -0.470095 0.169060 -0.241817
2000-04-30 0.446261 1.518129 2.263510 1.800027
2000-05-31 -0.088365 1.923264 1.763859 0.348480
diff
方法将计算两个日期之间的 datetime.timedelta
,第 0 个索引当然是 NaT,因为它之前没有任何内容。
>>> df['time_delta'] = df.index.to_series().diff()
>>> print(df)
A B C D time_delta
2000-01-31 1.051346 1.722165 -0.659687 1.026716 NaT
2000-02-29 0.352166 1.699898 1.469741 -0.138593 29 days
2000-03-31 -0.202217 -0.470095 0.169060 -0.241817 31 days
2000-04-30 0.446261 1.518129 2.263510 1.800027 30 days
2000-05-31 -0.088365 1.923264 1.763859 0.348480 31 days
然后如果你想得到浮点数而不是日期时间对象的天数,你可以使用Series.dt
访问器:
>>> days = df.time_delta.dt.days
>>> print(days)
2000-01-31 NaN
2000-02-29 29.0
2000-03-31 31.0
2000-04-30 30.0
2000-05-31 31.0
Freq: M, dtype: float64
您可以在索引上使用 to_frame()
方法将索引转换为列,然后在该列上调用 diff()
df2 = df.index.to_frame()
df2['diff'] = df2[df.0 == True]
要计算 'Date' 列中两天之间的天数,请在循环中使用 np.busday_count:
import pandas as pd
import numpy as np
for index, row in df.iterrows():
if index>0:
print(np.busday_count(dayA, row['Date']))
dayA=row['Date']
天数:
2001-01-03
2002-07-24
2002-07-29
2008-09-30
2008-10-13
2008-10-28
2008-11-13
2008-11-21
2008-11-24
2008-12-16
2009-03-10
2009-03-23
输出将是:
405
3
1611
9
11
12
6
1
16
60
9
我有以下系列:
df
产生
Date
2001-01-03 True
2002-07-24 True
2002-07-29 True
2008-09-30 True
2008-10-13 True
2008-10-28 True
2008-11-13 True
2008-11-21 True
2008-11-24 True
2008-12-16 True
2009-03-10 True
2009-03-23 True
Name: pct_day, dtype: bool
我如何找出真实值之间的天数,不包括周末?
这似乎有效:
import pandas as pd
df = pd.DataFrame({'Date' : pd.date_range(start='2/1/2018', end='2/08/2018', freq='D'),
'Label': 'True'})
df['DayOfWeek'] = df['Date'].dt.day_name()
df = df[(df.DayOfWeek != 'Saturday') & (df.DayOfWeek != 'Sunday') & (df.Label == 'True')]
df['Diff'] = df['Date'].diff()
你可以这样做:
(创建示例 DataFrame 只是为了举例)
>>> import pandas.util.testing as tm
>>> df = tm.makeTimeDataFrame(freq='M', nper=5)
>>> print(df)
A B C D
2000-01-31 1.051346 1.722165 -0.659687 1.026716
2000-02-29 0.352166 1.699898 1.469741 -0.138593
2000-03-31 -0.202217 -0.470095 0.169060 -0.241817
2000-04-30 0.446261 1.518129 2.263510 1.800027
2000-05-31 -0.088365 1.923264 1.763859 0.348480
diff
方法将计算两个日期之间的 datetime.timedelta
,第 0 个索引当然是 NaT,因为它之前没有任何内容。
>>> df['time_delta'] = df.index.to_series().diff()
>>> print(df)
A B C D time_delta
2000-01-31 1.051346 1.722165 -0.659687 1.026716 NaT
2000-02-29 0.352166 1.699898 1.469741 -0.138593 29 days
2000-03-31 -0.202217 -0.470095 0.169060 -0.241817 31 days
2000-04-30 0.446261 1.518129 2.263510 1.800027 30 days
2000-05-31 -0.088365 1.923264 1.763859 0.348480 31 days
然后如果你想得到浮点数而不是日期时间对象的天数,你可以使用Series.dt
访问器:
>>> days = df.time_delta.dt.days
>>> print(days)
2000-01-31 NaN
2000-02-29 29.0
2000-03-31 31.0
2000-04-30 30.0
2000-05-31 31.0
Freq: M, dtype: float64
您可以在索引上使用 to_frame()
方法将索引转换为列,然后在该列上调用 diff()
df2 = df.index.to_frame()
df2['diff'] = df2[df.0 == True]
要计算 'Date' 列中两天之间的天数,请在循环中使用 np.busday_count:
import pandas as pd
import numpy as np
for index, row in df.iterrows():
if index>0:
print(np.busday_count(dayA, row['Date']))
dayA=row['Date']
天数:
2001-01-03
2002-07-24
2002-07-29
2008-09-30
2008-10-13
2008-10-28
2008-11-13
2008-11-21
2008-11-24
2008-12-16
2009-03-10
2009-03-23
输出将是:
405
3
1611
9
11
12
6
1
16
60
9