从 Python 中的时间序列创建横截面数据框
Creating a cross-sectional dataframe from a time series in Python
假设我们有一个按分钟索引的时间序列如下:
df=
Time (HH:MM) Value
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
...
01/08/2014 00:00 5000
...
我正在 "group" 按周查找数据集,如下所示:
df2=
Week Val1 Val2 Val3 Val4 ...
1 1 2 3 4 ...
2 5000 ...
3
4
...
换句话说,第 1 周 (01/01/2014-01/08/2014) 中的每个 1 分钟观察值都表示为 df2 中的一列。 (每周应该有 10,080 minutes/columns)。
我已经尝试了一些函数,包括 groupby(),但它们中的大多数似乎都是聚合数据,而不是将其拆分为我正在寻找的各个列。
编辑:它不一定必须采用数据框格式,但我将其用于输入为周的函数。类似于尝试创建 每周 .
值的直方图
你需要weekofyear
+ cumcount
for count them for new columns names and then reshape by set_index
with unstack
:
1。如果 df
是 DataFrame
并且 Time (HH:MM)
是列的解决方案:
print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (df.columns)
Index(['Time (HH:MM)', 'Value'], dtype='object')
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
pivot
的另一个解决方案:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fi
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
如果需要用 0
替换 NaN,将参数 fill_value=0
添加到 unstack
:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack(fill_value=0).add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1 2 3 4
2 5000 0 0 0
在第二个解决方案中使用 fillna
:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fillna(0)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 0.0 0.0 0.0
2。如果 s
是 Series
并且 Time (HH:MM)
是索引的解决方案:
print (s)
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
Name: Value, dtype: int64
print (type(s))
<class 'pandas.core.series.Series'>
print (s.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')
weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount() + 1
df = s.to_frame().set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
第二个解决方案:
weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=s)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
3。如果 df
是 DataFrame
并且 Time (HH:MM)
是索引的解决方案:
print (df)
Value
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (df.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')
weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value'])
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
你可以像
一样使用pivot_table
In [3192]: df['Week'] = df['Time (HH:MM)'].dt.weekofyear
In [3193]: df['ValCount'] = 'Val' + df.groupby('Week').cumcount().add(1).astype(str)
In [3194]: df.pivot_table(index='Week', columns='ValCount', values='Value').reset_index()
Out[3194]:
ValCount Week Val1 Val2 Val3 Val4
0 1 1.0 2.0 3.0 4.0
1 2 5000.0 NaN NaN NaN
在索引
中有Week
In [3198]: df.pivot_table(index='Week', columns='ValCount',
values='Value').rename_axis(None, 1)
Out[3198]:
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
详情
In [3202]: df
Out[3202]:
Time (HH:MM) Value
0 2014-01-01 00:00:00 1
1 2014-01-01 00:01:00 2
2 2014-01-01 00:02:00 3
3 2014-01-01 00:03:00 4
4 2014-01-08 00:00:00 5000
In [3203]: df.dtypes
Out[3203]:
Time (HH:MM) datetime64[ns]
Value int64
dtype: object
假设我们有一个按分钟索引的时间序列如下:
df=
Time (HH:MM) Value
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
...
01/08/2014 00:00 5000
...
我正在 "group" 按周查找数据集,如下所示:
df2=
Week Val1 Val2 Val3 Val4 ...
1 1 2 3 4 ...
2 5000 ...
3
4
...
换句话说,第 1 周 (01/01/2014-01/08/2014) 中的每个 1 分钟观察值都表示为 df2 中的一列。 (每周应该有 10,080 minutes/columns)。
我已经尝试了一些函数,包括 groupby(),但它们中的大多数似乎都是聚合数据,而不是将其拆分为我正在寻找的各个列。
编辑:它不一定必须采用数据框格式,但我将其用于输入为周的函数。类似于尝试创建 每周 .
值的直方图你需要weekofyear
+ cumcount
for count them for new columns names and then reshape by set_index
with unstack
:
1。如果 df
是 DataFrame
并且 Time (HH:MM)
是列的解决方案:
print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (df.columns)
Index(['Time (HH:MM)', 'Value'], dtype='object')
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
pivot
的另一个解决方案:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fi
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
如果需要用 0
替换 NaN,将参数 fill_value=0
添加到 unstack
:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack(fill_value=0).add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1 2 3 4
2 5000 0 0 0
在第二个解决方案中使用 fillna
:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fillna(0)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 0.0 0.0 0.0
2。如果 s
是 Series
并且 Time (HH:MM)
是索引的解决方案:
print (s)
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
Name: Value, dtype: int64
print (type(s))
<class 'pandas.core.series.Series'>
print (s.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')
weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount() + 1
df = s.to_frame().set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
第二个解决方案:
weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=s)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
3。如果 df
是 DataFrame
并且 Time (HH:MM)
是索引的解决方案:
print (df)
Value
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (df.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')
weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value'])
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
你可以像
一样使用pivot_table
In [3192]: df['Week'] = df['Time (HH:MM)'].dt.weekofyear
In [3193]: df['ValCount'] = 'Val' + df.groupby('Week').cumcount().add(1).astype(str)
In [3194]: df.pivot_table(index='Week', columns='ValCount', values='Value').reset_index()
Out[3194]:
ValCount Week Val1 Val2 Val3 Val4
0 1 1.0 2.0 3.0 4.0
1 2 5000.0 NaN NaN NaN
在索引
中有Week
In [3198]: df.pivot_table(index='Week', columns='ValCount',
values='Value').rename_axis(None, 1)
Out[3198]:
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
详情
In [3202]: df
Out[3202]:
Time (HH:MM) Value
0 2014-01-01 00:00:00 1
1 2014-01-01 00:01:00 2
2 2014-01-01 00:02:00 3
3 2014-01-01 00:03:00 4
4 2014-01-08 00:00:00 5000
In [3203]: df.dtypes
Out[3203]:
Time (HH:MM) datetime64[ns]
Value int64
dtype: object