两个非连续行的差异 - Pandas
Difference of two non-consecutive rows - Pandas
我有一个历史价目表,我想计算每种货币价格之间的差异。我的代码将通过获取新价格来更新列表并将其附加到数据库。我该怎么做?
这就是元素在 table 上的方式:
Date Hour Currency Price Variation
0 2021-05-01 23:19:21 BAT 1.0700
1 2021-05-01 23:19:21 BTC 47922.1400
2 2021-05-01 23:19:21 DOGE 0.3286
3 2021-05-01 23:19:21 ETH 2451.7400
4 2021-05-01 23:35:50 BAT 1.0600
5 2021-05-01 23:35:50 BTC 47557.2700
6 2021-05-01 23:35:50 DOGE 0.3228
7 2021-05-01 23:35:50 ETH 2438.0300
8 2021-05-01 23:37:20 BAT 1.0500
9 2021-05-01 23:37:20 BTC 47467.0200
10 2021-05-01 23:37:20 DOGE 0.3209
11 2021-05-01 23:37:20 ETH 2435.3000
因此,如您所见,货币不是连续放置的。例如:
BAT价格变化:
0 -> 4 : (1.0600-1.0700)/1.0700 = -0.93%
4 -> 8 : (1.0500-1.0600)/1.0600 = -0.94%
last_value_index -> recent_value_index : (recent_value-last_value)/last_value
谢谢!
我们可以按 Currency
分组,然后在 Price
列上应用 pct_change()
df['Variation'] = 100*df.groupby('Currency').Price.pct_change()
或手动计算百分比变化
df['Variation'] = df.groupby('Currency').Price.transform(lambda x: 100*x.diff()/x)
新提供的 df
的输出
Date Hour Currency Price Variation
0 2021-05-01 23:19:21 BAT 1.0700 NaN
1 2021-05-01 23:19:21 BTC 47922.1400 NaN
2 2021-05-01 23:19:21 DOGE 0.3286 NaN
3 2021-05-01 23:19:21 ETH 2451.7400 NaN
4 2021-05-01 23:35:50 BAT 1.0600 -0.934579
5 2021-05-01 23:35:50 BTC 47557.2700 -0.761381
6 2021-05-01 23:35:50 DOGE 0.3228 -1.765064
7 2021-05-01 23:35:50 ETH 2438.0300 -0.559195
8 2021-05-01 23:37:20 BAT 1.0500 -0.943396
9 2021-05-01 23:37:20 BTC 47467.0200 -0.189771
10 2021-05-01 23:37:20 DOGE 0.3209 -0.588600
11 2021-05-01 23:37:20 ETH 2435.3000 -0.111976
12 2021-05-02 00:04:40 BAT 1.0200 -2.857143
13 2021-05-02 00:04:40 BTC 46883.6300 -1.229043
14 2021-05-02 00:04:40 DOGE 0.3028 -5.640386
15 2021-05-02 00:04:40 ETH 2397.8200 -1.539030
如果我们想用任何值填充 na,例如 0.0。
df['Variation'] = 100*df.groupby('Currency').Price.pct_change().fillna(0.)
我有一个历史价目表,我想计算每种货币价格之间的差异。我的代码将通过获取新价格来更新列表并将其附加到数据库。我该怎么做? 这就是元素在 table 上的方式:
Date Hour Currency Price Variation
0 2021-05-01 23:19:21 BAT 1.0700
1 2021-05-01 23:19:21 BTC 47922.1400
2 2021-05-01 23:19:21 DOGE 0.3286
3 2021-05-01 23:19:21 ETH 2451.7400
4 2021-05-01 23:35:50 BAT 1.0600
5 2021-05-01 23:35:50 BTC 47557.2700
6 2021-05-01 23:35:50 DOGE 0.3228
7 2021-05-01 23:35:50 ETH 2438.0300
8 2021-05-01 23:37:20 BAT 1.0500
9 2021-05-01 23:37:20 BTC 47467.0200
10 2021-05-01 23:37:20 DOGE 0.3209
11 2021-05-01 23:37:20 ETH 2435.3000
因此,如您所见,货币不是连续放置的。例如:
BAT价格变化:
0 -> 4 : (1.0600-1.0700)/1.0700 = -0.93%
4 -> 8 : (1.0500-1.0600)/1.0600 = -0.94%
last_value_index -> recent_value_index : (recent_value-last_value)/last_value
谢谢!
我们可以按 Currency
分组,然后在 Price
列上应用 pct_change()
df['Variation'] = 100*df.groupby('Currency').Price.pct_change()
或手动计算百分比变化
df['Variation'] = df.groupby('Currency').Price.transform(lambda x: 100*x.diff()/x)
新提供的 df
的输出 Date Hour Currency Price Variation
0 2021-05-01 23:19:21 BAT 1.0700 NaN
1 2021-05-01 23:19:21 BTC 47922.1400 NaN
2 2021-05-01 23:19:21 DOGE 0.3286 NaN
3 2021-05-01 23:19:21 ETH 2451.7400 NaN
4 2021-05-01 23:35:50 BAT 1.0600 -0.934579
5 2021-05-01 23:35:50 BTC 47557.2700 -0.761381
6 2021-05-01 23:35:50 DOGE 0.3228 -1.765064
7 2021-05-01 23:35:50 ETH 2438.0300 -0.559195
8 2021-05-01 23:37:20 BAT 1.0500 -0.943396
9 2021-05-01 23:37:20 BTC 47467.0200 -0.189771
10 2021-05-01 23:37:20 DOGE 0.3209 -0.588600
11 2021-05-01 23:37:20 ETH 2435.3000 -0.111976
12 2021-05-02 00:04:40 BAT 1.0200 -2.857143
13 2021-05-02 00:04:40 BTC 46883.6300 -1.229043
14 2021-05-02 00:04:40 DOGE 0.3028 -5.640386
15 2021-05-02 00:04:40 ETH 2397.8200 -1.539030
如果我们想用任何值填充 na,例如 0.0。
df['Variation'] = 100*df.groupby('Currency').Price.pct_change().fillna(0.)