在一列中查找其他两列中最匹配值的值
Looking up a value in a column for most matching values in two other columns
进入 pandas 数据框,我通过股票的 API 期权链数据检索。在 'expiration' 列中,您可以看到在这个测试用例中,我有三个期权系列,到期时间分别为:2019-08-15、2019-09-15 和 2019-10-15。
我想实现的是:
- 对于每个选项系列(此测试用例中为 3 个)
- 查找最接近'undPrice'(=标的股票价格)'strike'的价格
- 对于最接近'undPrice'的'strike'价格在'IV_model'中查找相应的值(=隐含波动率)
- 在该到期系列的所有期权组合的 'desired_outcome' 列中填写该值(因此,在此测试用例中,三次块具有相同的值)
- 因此,在一组数据中基本上有 3 个查找。
这是接近我的实际环境的测试用例代码:
import pandas as pd
undPrice = 202
df = pd.DataFrame(columns=['expiration', 'strike', 'undPrice', 'IV_model', 'desired_outcome'])
df['expiration'] = ['2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15']
df['expiration'] = df['expiration'].apply(lambda x: pd.to_datetime(str(x), utc=True,format='%Y-%m-%d'))
df['strike'] = [170, 175, 180, 185, 190, 195, 200, 205, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205]
df['undPrice'] = [undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice]
df['IV_model'] = [0.28, 0.27, 0.26, 0.25, 0.24, 0.23, 0.22, 0.21, 0.35, 0.34, 0.33, 0.32, 0.31, 0.30, 0.29, 0.28, 0.27, 0.26, 0.42, 0.41, 0.40, 0.39, 0.38, 0.37, 0.36, 0.35, 0.34, 0.33]
df['IV_model'] = df['IV_model'].map('{:.2%}'.format)
df['desired_outcome'] = [0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34]
df['desired_outcome'] = df['desired_outcome'].map('{:.2%}'.format)
print(df)
这将是(期望的)结果(显然 'desired_outcome' 手动填写):
expiration strike undPrice IV_model desired_outcome
0 2019-08-15 00:00:00+00:00 170 202 28.00% 22.00%
1 2019-08-15 00:00:00+00:00 175 202 27.00% 22.00%
2 2019-08-15 00:00:00+00:00 180 202 26.00% 22.00%
3 2019-08-15 00:00:00+00:00 185 202 25.00% 22.00%
4 2019-08-15 00:00:00+00:00 190 202 24.00% 22.00%
5 2019-08-15 00:00:00+00:00 195 202 23.00% 22.00%
6 2019-08-15 00:00:00+00:00 200 202 22.00% 22.00%
7 2019-08-15 00:00:00+00:00 205 202 21.00% 22.00%
8 2019-09-15 00:00:00+00:00 165 202 35.00% 28.00%
9 2019-09-15 00:00:00+00:00 170 202 34.00% 28.00%
10 2019-09-15 00:00:00+00:00 175 202 33.00% 28.00%
11 2019-09-15 00:00:00+00:00 180 202 32.00% 28.00%
12 2019-09-15 00:00:00+00:00 185 202 31.00% 28.00%
13 2019-09-15 00:00:00+00:00 190 202 30.00% 28.00%
14 2019-09-15 00:00:00+00:00 195 202 29.00% 28.00%
15 2019-09-15 00:00:00+00:00 200 202 28.00% 28.00%
16 2019-09-15 00:00:00+00:00 205 202 27.00% 28.00%
17 2019-09-15 00:00:00+00:00 210 202 26.00% 28.00%
18 2019-10-15 00:00:00+00:00 160 202 42.00% 34.00%
19 2019-10-15 00:00:00+00:00 165 202 41.00% 34.00%
20 2019-10-15 00:00:00+00:00 170 202 40.00% 34.00%
21 2019-10-15 00:00:00+00:00 175 202 39.00% 34.00%
22 2019-10-15 00:00:00+00:00 180 202 38.00% 34.00%
23 2019-10-15 00:00:00+00:00 185 202 37.00% 34.00%
24 2019-10-15 00:00:00+00:00 190 202 36.00% 34.00%
25 2019-10-15 00:00:00+00:00 195 202 35.00% 34.00%
26 2019-10-15 00:00:00+00:00 200 202 34.00% 34.00%
27 2019-10-15 00:00:00+00:00 205 202 33.00% 34.00%
我是 Python 编程的相对初学者,我已经走了很长一段路,但这超出了我的能力范围。我希望有人能帮我解决这个问题。
这是一种方法:
通过找到 undPrice 和行使价之间的最小距离,创建到 IV_model 的到期字典。
desiredOutcomeMap = df.groupby('expiration').apply(lambda x: df.loc[abs(x['undPrice']-x['strike']).idxmin(), 'IV_model']).to_dict()
然后映射到原来的df。
df['desired_outcome'] = df['expiration'].map(desiredOutcomeMap)
进入 pandas 数据框,我通过股票的 API 期权链数据检索。在 'expiration' 列中,您可以看到在这个测试用例中,我有三个期权系列,到期时间分别为:2019-08-15、2019-09-15 和 2019-10-15。
我想实现的是:
- 对于每个选项系列(此测试用例中为 3 个)
- 查找最接近'undPrice'(=标的股票价格)'strike'的价格
- 对于最接近'undPrice'的'strike'价格在'IV_model'中查找相应的值(=隐含波动率)
- 在该到期系列的所有期权组合的 'desired_outcome' 列中填写该值(因此,在此测试用例中,三次块具有相同的值)
- 因此,在一组数据中基本上有 3 个查找。
这是接近我的实际环境的测试用例代码:
import pandas as pd
undPrice = 202
df = pd.DataFrame(columns=['expiration', 'strike', 'undPrice', 'IV_model', 'desired_outcome'])
df['expiration'] = ['2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-08-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-09-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15', '2019-10-15']
df['expiration'] = df['expiration'].apply(lambda x: pd.to_datetime(str(x), utc=True,format='%Y-%m-%d'))
df['strike'] = [170, 175, 180, 185, 190, 195, 200, 205, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205]
df['undPrice'] = [undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice, undPrice]
df['IV_model'] = [0.28, 0.27, 0.26, 0.25, 0.24, 0.23, 0.22, 0.21, 0.35, 0.34, 0.33, 0.32, 0.31, 0.30, 0.29, 0.28, 0.27, 0.26, 0.42, 0.41, 0.40, 0.39, 0.38, 0.37, 0.36, 0.35, 0.34, 0.33]
df['IV_model'] = df['IV_model'].map('{:.2%}'.format)
df['desired_outcome'] = [0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.28, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34]
df['desired_outcome'] = df['desired_outcome'].map('{:.2%}'.format)
print(df)
这将是(期望的)结果(显然 'desired_outcome' 手动填写):
expiration strike undPrice IV_model desired_outcome
0 2019-08-15 00:00:00+00:00 170 202 28.00% 22.00%
1 2019-08-15 00:00:00+00:00 175 202 27.00% 22.00%
2 2019-08-15 00:00:00+00:00 180 202 26.00% 22.00%
3 2019-08-15 00:00:00+00:00 185 202 25.00% 22.00%
4 2019-08-15 00:00:00+00:00 190 202 24.00% 22.00%
5 2019-08-15 00:00:00+00:00 195 202 23.00% 22.00%
6 2019-08-15 00:00:00+00:00 200 202 22.00% 22.00%
7 2019-08-15 00:00:00+00:00 205 202 21.00% 22.00%
8 2019-09-15 00:00:00+00:00 165 202 35.00% 28.00%
9 2019-09-15 00:00:00+00:00 170 202 34.00% 28.00%
10 2019-09-15 00:00:00+00:00 175 202 33.00% 28.00%
11 2019-09-15 00:00:00+00:00 180 202 32.00% 28.00%
12 2019-09-15 00:00:00+00:00 185 202 31.00% 28.00%
13 2019-09-15 00:00:00+00:00 190 202 30.00% 28.00%
14 2019-09-15 00:00:00+00:00 195 202 29.00% 28.00%
15 2019-09-15 00:00:00+00:00 200 202 28.00% 28.00%
16 2019-09-15 00:00:00+00:00 205 202 27.00% 28.00%
17 2019-09-15 00:00:00+00:00 210 202 26.00% 28.00%
18 2019-10-15 00:00:00+00:00 160 202 42.00% 34.00%
19 2019-10-15 00:00:00+00:00 165 202 41.00% 34.00%
20 2019-10-15 00:00:00+00:00 170 202 40.00% 34.00%
21 2019-10-15 00:00:00+00:00 175 202 39.00% 34.00%
22 2019-10-15 00:00:00+00:00 180 202 38.00% 34.00%
23 2019-10-15 00:00:00+00:00 185 202 37.00% 34.00%
24 2019-10-15 00:00:00+00:00 190 202 36.00% 34.00%
25 2019-10-15 00:00:00+00:00 195 202 35.00% 34.00%
26 2019-10-15 00:00:00+00:00 200 202 34.00% 34.00%
27 2019-10-15 00:00:00+00:00 205 202 33.00% 34.00%
我是 Python 编程的相对初学者,我已经走了很长一段路,但这超出了我的能力范围。我希望有人能帮我解决这个问题。
这是一种方法:
通过找到 undPrice 和行使价之间的最小距离,创建到 IV_model 的到期字典。
desiredOutcomeMap = df.groupby('expiration').apply(lambda x: df.loc[abs(x['undPrice']-x['strike']).idxmin(), 'IV_model']).to_dict()
然后映射到原来的df。
df['desired_outcome'] = df['expiration'].map(desiredOutcomeMap)