groupby 并在 pandas 中申请
groupby and apply in pandas
我想达到的目的:计算每天加权的交易量return(公式是交易量*每天return/每个股票代码的累计交易量),因为这应该是每个股票代码,我使用groupby 自动收报机然后日期,
这是我现在拥有的代码。
stock_data['VWDR'] = stock_data.groupby(['Ticker','Date'])[['Volume', 'DailyReturn']].sum().apply(lambda df: df['Volume']*df['DailyReturn']/ df['Volume'].cumsum())
这是错误信息
KeyError: 'Volume'
下面是获取测试数据
import pandas as pd
import yfinance as yf
# now just read the html to get all the S&P500 tickers
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()
如果我从股票数据中提取代码 table,它工作正常,如下所示:
AMZN = stock_data[stock_data.Ticker=='AMZN'].copy()
AMZN['VWDR'] = AMZN['Volume'] * AMZN['DailyReturn']/ AMZN['Volume'].cumsum()
但是我不确定我在groupby代码中做错了什么,或者有没有其他更简单的方法可以达到目的?
添加这个。
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
在此之前。
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
我这样做了,得到了这个结果。
Date Ticker Close ... Stock Splits Volume DailyReturn
0 2020-05-01 AAL 10.640000 ... 0 99441400 NaN
1 2020-05-01 AEE 67.797997 ... 0 1520200 NaN
2 2020-05-01 AEP 75.347603 ... 0 2742100 NaN
3 2020-05-01 AMCR 7.925522 ... 0 4097600 NaN
4 2020-05-01 AMD 49.880001 ... 0 69562700 NaN
... ... ... ... ... ... ...
5035 2022-04-29 AMT 241.020004 ... 0 2151900 -0.044254
5036 2022-04-29 AMZN 2485.629883 ... 0 13616500 -0.140494
5037 2022-04-29 AXP 174.710007 ... 0 3210100 -0.039949
5038 2022-04-29 GOOG 2299.330078 ... 0 1683500 -0.037224
5039 2022-04-29 MO 55.570000 ... 0 10861600 0.006703
[5040 rows x 10 columns]
然后
stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()
stock_data['VWDR']
结果。
参考。
https://analyzingalpha.com/vwap
所有代码。
import pandas as pd
import yfinance as yf
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# now just read the html to get all the S&P500 tickers
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()
stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()
stock_data['VWDR']
创建了执行计算的函数 'func_data'。结果放在 'test' 列中,该列之前是用 nan 值创建的。
stock_data['test'] = np.nan
def func_data(x):
x['test'] = x['Volume'] * x['DailyReturn'] / x['Volume'].cumsum()
return x
stock_data['test'] = stock_data.groupby(['Ticker']).apply(func_data).iloc[:, -1]
print(AMZN)
print(stock_data)
输出
Date Ticker Close ... Volume DailyReturn test
0 2022-02-28 GOOG 2697.820068 ... 1483800 NaN NaN
1 2022-02-28 MO 50.422642 ... 8646400 NaN NaN
2 2022-03-01 GOOG 2683.360107 ... 1232000 -0.005360 -0.002431
3 2022-03-01 MO 50.697903 ... 9693000 0.005459 0.002885
4 2022-03-02 GOOG 2695.030029 ... 1198300 0.004349 0.001331
.. ... ... ... ... ... ... ...
83 2022-04-27 MO 54.919998 ... 7946600 0.000729 0.000015
84 2022-04-28 GOOG 2388.229980 ... 1839500 0.038176 0.001172
85 2022-04-28 MO 55.200001 ... 8153900 0.005098 0.000106
86 2022-04-29 GOOG 2299.330078 ... 1683500 -0.037224 -0.001017
87 2022-04-29 MO 55.570000 ... 10861600 0.006703 0.000180
我想达到的目的:计算每天加权的交易量return(公式是交易量*每天return/每个股票代码的累计交易量),因为这应该是每个股票代码,我使用groupby 自动收报机然后日期, 这是我现在拥有的代码。
stock_data['VWDR'] = stock_data.groupby(['Ticker','Date'])[['Volume', 'DailyReturn']].sum().apply(lambda df: df['Volume']*df['DailyReturn']/ df['Volume'].cumsum())
这是错误信息
KeyError: 'Volume'
下面是获取测试数据
import pandas as pd
import yfinance as yf
# now just read the html to get all the S&P500 tickers
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()
如果我从股票数据中提取代码 table,它工作正常,如下所示:
AMZN = stock_data[stock_data.Ticker=='AMZN'].copy()
AMZN['VWDR'] = AMZN['Volume'] * AMZN['DailyReturn']/ AMZN['Volume'].cumsum()
但是我不确定我在groupby代码中做错了什么,或者有没有其他更简单的方法可以达到目的?
添加这个。
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
在此之前。
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
我这样做了,得到了这个结果。
Date Ticker Close ... Stock Splits Volume DailyReturn
0 2020-05-01 AAL 10.640000 ... 0 99441400 NaN
1 2020-05-01 AEE 67.797997 ... 0 1520200 NaN
2 2020-05-01 AEP 75.347603 ... 0 2742100 NaN
3 2020-05-01 AMCR 7.925522 ... 0 4097600 NaN
4 2020-05-01 AMD 49.880001 ... 0 69562700 NaN
... ... ... ... ... ... ...
5035 2022-04-29 AMT 241.020004 ... 0 2151900 -0.044254
5036 2022-04-29 AMZN 2485.629883 ... 0 13616500 -0.140494
5037 2022-04-29 AXP 174.710007 ... 0 3210100 -0.039949
5038 2022-04-29 GOOG 2299.330078 ... 0 1683500 -0.037224
5039 2022-04-29 MO 55.570000 ... 0 10861600 0.006703
[5040 rows x 10 columns]
然后
stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()
stock_data['VWDR']
结果。
参考。
https://analyzingalpha.com/vwap
所有代码。
import pandas as pd
import yfinance as yf
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# now just read the html to get all the S&P500 tickers
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()
stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()
stock_data['VWDR']
创建了执行计算的函数 'func_data'。结果放在 'test' 列中,该列之前是用 nan 值创建的。
stock_data['test'] = np.nan
def func_data(x):
x['test'] = x['Volume'] * x['DailyReturn'] / x['Volume'].cumsum()
return x
stock_data['test'] = stock_data.groupby(['Ticker']).apply(func_data).iloc[:, -1]
print(AMZN)
print(stock_data)
输出
Date Ticker Close ... Volume DailyReturn test
0 2022-02-28 GOOG 2697.820068 ... 1483800 NaN NaN
1 2022-02-28 MO 50.422642 ... 8646400 NaN NaN
2 2022-03-01 GOOG 2683.360107 ... 1232000 -0.005360 -0.002431
3 2022-03-01 MO 50.697903 ... 9693000 0.005459 0.002885
4 2022-03-02 GOOG 2695.030029 ... 1198300 0.004349 0.001331
.. ... ... ... ... ... ... ...
83 2022-04-27 MO 54.919998 ... 7946600 0.000729 0.000015
84 2022-04-28 GOOG 2388.229980 ... 1839500 0.038176 0.001172
85 2022-04-28 MO 55.200001 ... 8153900 0.005098 0.000106
86 2022-04-29 GOOG 2299.330078 ... 1683500 -0.037224 -0.001017
87 2022-04-29 MO 55.570000 ... 10861600 0.006703 0.000180