在 pandas 数据框中获取 t-1 值(来自前一个单元格)
Get t-1 value (from previous cell) in a pandas dataframe
我正在尝试创建新列,其中每一行都具有前一行(前一天)的值。
我的数据格式是这样的(在原始文件中有 12 列加上时间戳和数千行):
import numpy as np
import pandas as pd
df = pd.DataFrame({"Timestamp" : ['1993-11-01' ,'1993-11-02', '1993-11-03', '1993-11-04','1993-11-15'], "Austria" : [6.11 ,6.18, 6.17, 6.17, 6.40],"Belgium" : [7.01, 7.05, 7.2, 7.5, 7.6],"France" : [7.69, 7.61, 7.67, 7.91, 8.61]},index = [1, 2, 3,4,5])
我有:
Timestamp Austria Belgium France
1 1993-11-01 6.11 7.01 7.69
2 1993-11-02 6.18 7.05 7.61
3 1993-11-03 6.17 7.20 7.67
4 1993-11-04 6.17 7.50 7.91
5 1993-11-15 6.40 7.60 8.61
我想要的:
Timestamp Austria t-1 Belgium t-1 France t-1
1 1993-11-01 NaN NaN NaN
2 1993-11-02 6.11 7.01 7.69
3 1993-11-03 6.18 7.05 7.61
4 1993-11-04 6.17 7.20 7.67
5 1993-11-15 6.17 7.50 7.91
在 Excel 中很容易,但我在 Python 中找不到这样做的方法。但肯定有办法。有人知道怎么做吗?
在列上使用 shift 来计算:
cols = ["Austria", "Belgium", "France"]
df[cols] = df[cols].shift()
print(df)
输出
Timestamp Austria Belgium France
1 1993-11-01 NaN NaN NaN
2 1993-11-02 6.11 7.01 7.69
3 1993-11-03 6.18 7.05 7.61
4 1993-11-04 6.17 7.20 7.67
5 1993-11-15 6.17 7.50 7.91
替代方案:
df.iloc[:, 1:] = df.iloc[:, 1:].shift()
print(df)
第一个df.set_index
on Timestamp
column, then use df.shift
:
In [4400]: d = df.set_index('Timestamp').shift()
In [4403]: d.columns = [i + ' t-1' for i in d.columns]
In [4406]: d.reset_index(inplace=True)
In [4407]: d
Out[4407]:
Timestamp Austria t-1 Belgium t-1 France t-1
0 1993-11-01 NaN NaN NaN
1 1993-11-02 6.11 7.01 7.69
2 1993-11-03 6.18 7.05 7.61
3 1993-11-04 6.17 7.20 7.67
4 1993-11-15 6.17 7.50 7.91
我正在尝试创建新列,其中每一行都具有前一行(前一天)的值。
我的数据格式是这样的(在原始文件中有 12 列加上时间戳和数千行):
import numpy as np
import pandas as pd
df = pd.DataFrame({"Timestamp" : ['1993-11-01' ,'1993-11-02', '1993-11-03', '1993-11-04','1993-11-15'], "Austria" : [6.11 ,6.18, 6.17, 6.17, 6.40],"Belgium" : [7.01, 7.05, 7.2, 7.5, 7.6],"France" : [7.69, 7.61, 7.67, 7.91, 8.61]},index = [1, 2, 3,4,5])
我有:
Timestamp Austria Belgium France
1 1993-11-01 6.11 7.01 7.69
2 1993-11-02 6.18 7.05 7.61
3 1993-11-03 6.17 7.20 7.67
4 1993-11-04 6.17 7.50 7.91
5 1993-11-15 6.40 7.60 8.61
我想要的:
Timestamp Austria t-1 Belgium t-1 France t-1
1 1993-11-01 NaN NaN NaN
2 1993-11-02 6.11 7.01 7.69
3 1993-11-03 6.18 7.05 7.61
4 1993-11-04 6.17 7.20 7.67
5 1993-11-15 6.17 7.50 7.91
在 Excel 中很容易,但我在 Python 中找不到这样做的方法。但肯定有办法。有人知道怎么做吗?
在列上使用 shift 来计算:
cols = ["Austria", "Belgium", "France"]
df[cols] = df[cols].shift()
print(df)
输出
Timestamp Austria Belgium France
1 1993-11-01 NaN NaN NaN
2 1993-11-02 6.11 7.01 7.69
3 1993-11-03 6.18 7.05 7.61
4 1993-11-04 6.17 7.20 7.67
5 1993-11-15 6.17 7.50 7.91
替代方案:
df.iloc[:, 1:] = df.iloc[:, 1:].shift()
print(df)
第一个df.set_index
on Timestamp
column, then use df.shift
:
In [4400]: d = df.set_index('Timestamp').shift()
In [4403]: d.columns = [i + ' t-1' for i in d.columns]
In [4406]: d.reset_index(inplace=True)
In [4407]: d
Out[4407]:
Timestamp Austria t-1 Belgium t-1 France t-1
0 1993-11-01 NaN NaN NaN
1 1993-11-02 6.11 7.01 7.69
2 1993-11-03 6.18 7.05 7.61
3 1993-11-04 6.17 7.20 7.67
4 1993-11-15 6.17 7.50 7.91