groupby 并在 pandas 中申请

groupby and apply in pandas

我想达到的目的:计算每天加权的交易量return(公式是交易量*每天return/每个股票代码的累计交易量),因为这应该是每个股票代码,我使用groupby 自动收报机然后日期, 这是我现在拥有的代码。

stock_data['VWDR'] = stock_data.groupby(['Ticker','Date'])[['Volume', 'DailyReturn']].sum().apply(lambda df: df['Volume']*df['DailyReturn']/ df['Volume'].cumsum())

这是错误信息

KeyError: 'Volume'

下面是获取测试数据

import pandas as pd
import yfinance as yf
# now just read the html to get all the S&P500 tickers 
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()

如果我从股票数据中提取代码 table,它工作正常,如下所示:

AMZN = stock_data[stock_data.Ticker=='AMZN'].copy()
AMZN['VWDR'] = AMZN['Volume'] * AMZN['DailyReturn']/ AMZN['Volume'].cumsum()

但是我不确定我在groupby代码中做错了什么,或者有没有其他更简单的方法可以达到目的?

添加这个。

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

在此之前。

dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')

我这样做了,得到了这个结果。

           Date Ticker        Close  ...  Stock Splits    Volume  DailyReturn
0    2020-05-01    AAL    10.640000  ...             0  99441400          NaN
1    2020-05-01    AEE    67.797997  ...             0   1520200          NaN
2    2020-05-01    AEP    75.347603  ...             0   2742100          NaN
3    2020-05-01   AMCR     7.925522  ...             0   4097600          NaN
4    2020-05-01    AMD    49.880001  ...             0  69562700          NaN
        ...    ...          ...  ...           ...       ...          ...
5035 2022-04-29    AMT   241.020004  ...             0   2151900    -0.044254
5036 2022-04-29   AMZN  2485.629883  ...             0  13616500    -0.140494
5037 2022-04-29    AXP   174.710007  ...             0   3210100    -0.039949
5038 2022-04-29   GOOG  2299.330078  ...             0   1683500    -0.037224
5039 2022-04-29     MO    55.570000  ...             0  10861600     0.006703

[5040 rows x 10 columns]

然后

stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()

stock_data['VWDR']

结果。

参考。

https://analyzingalpha.com/vwap

所有代码。

import pandas as pd
import yfinance as yf

import ssl
ssl._create_default_https_context = ssl._create_unverified_context


# now just read the html to get all the S&P500 tickers 
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()

stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()

stock_data['VWDR']

创建了执行计算的函数 'func_data'。结果放在 'test' 列中,该列之前是用 nan 值创建的。

stock_data['test'] = np.nan

def func_data(x):
    x['test'] = x['Volume'] * x['DailyReturn'] / x['Volume'].cumsum()

    return x

stock_data['test'] = stock_data.groupby(['Ticker']).apply(func_data).iloc[:, -1]
print(AMZN)
print(stock_data)

输出

         Date Ticker        Close  ...    Volume  DailyReturn      test
0  2022-02-28   GOOG  2697.820068  ...   1483800          NaN       NaN
1  2022-02-28     MO    50.422642  ...   8646400          NaN       NaN
2  2022-03-01   GOOG  2683.360107  ...   1232000    -0.005360 -0.002431
3  2022-03-01     MO    50.697903  ...   9693000     0.005459  0.002885
4  2022-03-02   GOOG  2695.030029  ...   1198300     0.004349  0.001331
..        ...    ...          ...  ...       ...          ...       ...
83 2022-04-27     MO    54.919998  ...   7946600     0.000729  0.000015
84 2022-04-28   GOOG  2388.229980  ...   1839500     0.038176  0.001172
85 2022-04-28     MO    55.200001  ...   8153900     0.005098  0.000106
86 2022-04-29   GOOG  2299.330078  ...   1683500    -0.037224 -0.001017
87 2022-04-29     MO    55.570000  ...  10861600     0.006703  0.000180