从 Python 中的时间序列创建横截面数据框

Creating a cross-sectional dataframe from a time series in Python

假设我们有一个按分钟索引的时间序列如下:

df=

Time (HH:MM)     Value
01/01/2014 00:00  1 
01/01/2014 00:01  2
01/01/2014 00:02  3
01/01/2014 00:03  4
...
01/08/2014 00:00  5000
...

我正在 "group" 按周查找数据集,如下所示:

df2=

Week  Val1 Val2 Val3 Val4 ...
1     1    2    3    4    ...
2     5000 ...
3
4
...

换句话说,第 1 周 (01/01/2014-01/08/2014) 中的每个 1 分钟观察值都表示为 df2 中的一列。 (每周应该有 10,080 minutes/columns)。

我已经尝试了一些函数,包括 groupby(),但它们中的大多数似乎都是聚合数据,而不是将其拆分为我正在寻找的各个列。

编辑:它不一定必须采用数据框格式,但我将其用于输入为周的函数。类似于尝试创建 每周 .

值的直方图

你需要weekofyear + cumcount for count them for new columns names and then reshape by set_index with unstack:

1。如果 dfDataFrame 并且 Time (HH:MM) 是列的解决方案:

print (type(df))
<class 'pandas.core.frame.DataFrame'>

print (df.columns)
Index(['Time (HH:MM)', 'Value'], dtype='object')

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
        Val1  Val2  Val3  Val4
Week                          
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

pivot的另一个解决方案:

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fi
print (df)
        Val1  Val2  Val3  Val4
Week                          
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

如果需要用 0 替换 NaN,将参数 fill_value=0 添加到 unstack:

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack(fill_value=0).add_prefix('Val')
print (df)
      Val1  Val2  Val3  Val4
Week                        
1        1     2     3     4
2     5000     0     0     0

在第二个解决方案中使用 fillna:

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fillna(0)
print (df)
        Val1  Val2  Val3  Val4
Week                          
1        1.0   2.0   3.0   4.0
2     5000.0   0.0   0.0   0.0

2。如果 sSeries 并且 Time (HH:MM) 是索引的解决方案:

print (s)

Time (HH:MM)
01/01/2014 00:00       1
01/01/2014 00:01       2
01/01/2014 00:02       3
01/01/2014 00:03       4
01/08/2014 00:00    5000
Name: Value, dtype: int64

print (type(s))
<class 'pandas.core.series.Series'>

print (s.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
       '01/01/2014 00:03', '01/08/2014 00:00'],
      dtype='object', name='Time (HH:MM)')

weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount() + 1
df = s.to_frame().set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
        Val1  Val2  Val3  Val4
Week                          
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

第二个解决方案:

weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=s)
print (df)
        Val1  Val2  Val3  Val4
Week                          
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

3。如果 dfDataFrame 并且 Time (HH:MM) 是索引的解决方案:

print (df)
                  Value
Time (HH:MM)           
01/01/2014 00:00      1
01/01/2014 00:01      2
01/01/2014 00:02      3
01/01/2014 00:03      4
01/08/2014 00:00   5000

print (type(df))
<class 'pandas.core.frame.DataFrame'>

print (df.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
       '01/01/2014 00:03', '01/08/2014 00:00'],
      dtype='object', name='Time (HH:MM)')

weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')

weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value'])
print (df)

        Val1  Val2  Val3  Val4
Week                          
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

你可以像

一样使用pivot_table
In [3192]: df['Week'] = df['Time (HH:MM)'].dt.weekofyear

In [3193]: df['ValCount'] = 'Val' + df.groupby('Week').cumcount().add(1).astype(str)

In [3194]: df.pivot_table(index='Week', columns='ValCount', values='Value').reset_index()
Out[3194]:
ValCount  Week    Val1  Val2  Val3  Val4
0            1     1.0   2.0   3.0   4.0
1            2  5000.0   NaN   NaN   NaN

在索引

中有Week
In [3198]: df.pivot_table(index='Week', columns='ValCount',
                          values='Value').rename_axis(None, 1)
Out[3198]:
        Val1  Val2  Val3  Val4
Week
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

详情

In [3202]: df
Out[3202]:
         Time (HH:MM)  Value
0 2014-01-01 00:00:00      1
1 2014-01-01 00:01:00      2
2 2014-01-01 00:02:00      3
3 2014-01-01 00:03:00      4
4 2014-01-08 00:00:00   5000

In [3203]: df.dtypes
Out[3203]:
Time (HH:MM)    datetime64[ns]
Value                    int64
dtype: object