从雅虎财经一次下载多只股票 python
Downloading mutliple stocks at once from yahoo finance python
我对使用 pandas 数据 reader 的雅虎财务的功能有疑问。几个月来我一直在使用一个包含股票代码的列表,并在以下几行中执行它:
import pandas_datareader as pdr
import datetime
stocks = ["stock1","stock2",....]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
f = pdr.DataReader(stocks, 'yahoo',start,end)
从昨天开始,我收到错误 "IndexError: list index out of range",只有在我尝试获取多只股票时才会出现。
最近几天有什么变化需要我考虑吗?或者你有更好的解决方案吗?
更新于 2021-01-19
- 此时,OP 中的执行没有问题,可以下载多个股票。
- Version: 0.9.0 Date: July 10, 2020
- GitHub: pydata / pandas-datareader
tickers = ['msft', 'aapl', 'twtr', 'intc', 'tsm', 'goog', 'amzn', 'fb', 'nvda']
df = pdr.DataReader(tickers, data_source='yahoo', start='2017-01-01', end='2020-09-28')
原答案
如果您通读 Pandas DataReader 的 documentation,他们会立即发布对多个数据源 API 的折旧,其中之一是 Yahoo!金融.
v0.6.0 (January 24, 2018)
Immediate deprecation of Yahoo!, Google Options and Quotes and EDGAR.
The end points behind these APIs have radically changed and the
existing readers require complete rewrites. In the case of most Yahoo!
data the endpoints have been removed. PDR would like to restore these
features, and pull requests are welcome.
这可能是导致您出现 IndexError
(或任何其他通常 none-existant 错误)的罪魁祸首。
但是,还有另一个 Python 软件包,其目标是修复对 Yahoo! 的支持。 Pandas DataReader 的财务,您可以在此处找到该软件包:
https://pypi.python.org/pypi/fix-yahoo-finance
根据他们的文档:
Yahoo! finance has decommissioned their historical data API, causing many programs that relied on it to stop working.
fix-yahoo-finance offers a temporary fix to the problem by scraping the data from Yahoo! finance using and return a Pandas
DataFrame/Panel in the same format as pandas_datareader’s
get_data_yahoo()
.
By basically “hijacking” pandas_datareader.data.get_data_yahoo()
method, fix-yahoo-finance’s implantation is easy and only requires
to import fix_yahoo_finance
into your code.
您需要添加的是:
from pandas_datareader import data as pdr
import fix_yahoo_finance as yf
yf.pdr_override()
stocks = ["stock1","stock2", ...]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
f = pdr.get_data_yahoo(stocks, start=start, end=end)
甚至不需要 Pandas DataReader:
import fix_yahoo_finance as yf
stocks = ["stock1","stock2", ...]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
data = yf.download(stocks, start=start, end=end)
您可以使用带有 pandas 的新 Python YahooFinancials 模块来执行此操作。 YahooFinancials 构建良好,通过散列出每个 Yahoo Finance 网页中存在的数据存储对象来获取数据,因此它速度很快,并且不依赖于旧的已停产 api,也不依赖于像爬虫那样的网络驱动程序。数据以 JSON 形式返回,您可以通过传递 stock/index 代码列表来一次提取任意数量的股票,以初始化 YahooFinancials Class。
$ pip 安装 yahoofinancials
用法示例:
from yahoofinancials import YahooFinancials
import pandas as pd
# Select Tickers and stock history dates
ticker = 'AAPL'
ticker2 = 'MSFT'
ticker3 = 'INTC'
index = '^NDX'
freq = 'daily'
start_date = '2012-10-01'
end_date = '2017-10-01'
# Function to clean data extracts
def clean_stock_data(stock_data_list):
new_list = []
for rec in stock_data_list:
if 'type' not in rec.keys():
new_list.append(rec)
return new_list
# Construct yahoo financials objects for data extraction
aapl_financials = YahooFinancials(ticker)
mfst_financials = YahooFinancials(ticker2)
intl_financials = YahooFinancials(ticker3)
index_financials = YahooFinancials(index)
# Clean returned stock history data and remove dividend events from price history
daily_aapl_data = clean_stock_data(aapl_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker]['prices'])
daily_msft_data = clean_stock_data(mfst_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker2]['prices'])
daily_intl_data = clean_stock_data(intl_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker3]['prices'])
daily_index_data = index_financials.get_historical_stock_data(start_date, end_date, freq)[index]['prices']
stock_hist_data_list = [{'NDX': daily_index_data}, {'AAPL': daily_aapl_data}, {'MSFT': daily_msft_data},
{'INTL': daily_intl_data}]
# Function to construct data frame based on a stock and it's market index
def build_data_frame(data_list1, data_list2, data_list3, data_list4):
data_dict = {}
i = 0
for list_item in data_list2:
if 'type' not in list_item.keys():
data_dict.update({list_item['formatted_date']: {'NDX': data_list1[i]['close'], 'AAPL': list_item['close'],
'MSFT': data_list3[i]['close'],
'INTL': data_list4[i]['close']}})
i += 1
tseries = pd.to_datetime(list(data_dict.keys()))
df = pd.DataFrame(data=list(data_dict.values()), index=tseries,
columns=['NDX', 'AAPL', 'MSFT', 'INTL']).sort_index()
return df
一次多个股票数据示例(returns 每个代码的 JSON 个对象列表):
from yahoofinancials import YahooFinancials
tech_stocks = ['AAPL', 'MSFT', 'INTC']
bank_stocks = ['WFC', 'BAC', 'C']
yahoo_financials_tech = YahooFinancials(tech_stocks)
yahoo_financials_banks = YahooFinancials(bank_stocks)
tech_cash_flow_data_an = yahoo_financials_tech.get_financial_stmts('annual', 'cash')
bank_cash_flow_data_an = yahoo_financials_banks.get_financial_stmts('annual', 'cash')
banks_net_ebit = yahoo_financials_banks.get_ebit()
tech_stock_price_data = tech_cash_flow_data.get_stock_price_data()
daily_bank_stock_prices = yahoo_financials_banks.get_historical_stock_data('2008-09-15', '2017-09-15', 'daily')
JSON 输出示例:
代码:
yahoo_financials = YahooFinancials('WFC')
print(yahoo_financials.get_historical_stock_data("2017-09-10", "2017-10-10", "monthly"))
JSON Return:
{
"WFC": {
"prices": [
{
"volume": 260271600,
"formatted_date": "2017-09-30",
"high": 55.77000045776367,
"adjclose": 54.91999816894531,
"low": 52.84000015258789,
"date": 1506830400,
"close": 54.91999816894531,
"open": 55.15999984741211
}
],
"eventsData": [],
"firstTradeDate": {
"date": 76233600,
"formatted_date": "1972-06-01"
},
"isPending": false,
"timeZone": {
"gmtOffset": -14400
},
"id": "1mo15050196001507611600"
}
}
yahoo_finance已经不能用了,因为雅虎改格式了,fix_yahoo_finance可以下载数据了。但是要解析你需要其他库,这里是简单的工作示例:
import numpy as np #python library for scientific computing
import pandas as pd #python library for data manipulation and analysis
import matplotlib.pyplot as plt #python library for charting
import fix_yahoo_finance as yf #python library to scrape data from yahoo finance
from pandas_datareader import data as pdr #extract data from internet sources into pandas data frame
yf.pdr_override()
data = pdr.get_data_yahoo(‘^DJI’, start=”2006–01–01")
data2 = pdr.get_data_yahoo(“MSFT”, start=”2006–01–01")
data3 = pdr.get_data_yahoo(“AAPL”, start=”2006–01–01")
data4 = pdr.get_data_yahoo(“BB.TO”, start=”2006–01–01")
ax = (data[‘Close’] / data[‘Close’].iloc[0] * 100).plot(figsize=(15, 6))
(data2[‘Close’] / data2[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
(data3[‘Close’] / data3[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
(data4[‘Close’] / data5[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
plt.legend([‘Dow Jones’, ‘Microsoft’, ‘Apple’, ‘Blackberry’], loc=’upper left’)
plt.show()
试试这个简单的代码
watchlist=["stock1","stock2".......]
closing_price=pd.DataFrame()
symbols=[]
for i in watchlist:
Result=wb.DataReader(i,start='05-1-20', end='05-20-20',data_source='yahoo')
closing_price=closing_price.append(Result)
symbols.append(i)
print("Generating Closing price for",i)
closing_price["SYMBOL"]=symbols
print("closing_price"
这对我有用。
assets = ['TSLA', 'MSFT', 'FB']
yahoo_financials = YahooFinancials(assets)
data = yahoo_financials.get_historical_price_data(start_date='2019-01-01',
end_date='2019-12-31',
time_interval='weekly')
prices_df = pd.DataFrame({
a: {x['formatted_date']: x['adjclose'] for x in data[a]['prices']} for a in assets})
prices_df
结果:
我对使用 pandas 数据 reader 的雅虎财务的功能有疑问。几个月来我一直在使用一个包含股票代码的列表,并在以下几行中执行它:
import pandas_datareader as pdr
import datetime
stocks = ["stock1","stock2",....]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
f = pdr.DataReader(stocks, 'yahoo',start,end)
从昨天开始,我收到错误 "IndexError: list index out of range",只有在我尝试获取多只股票时才会出现。
最近几天有什么变化需要我考虑吗?或者你有更好的解决方案吗?
更新于 2021-01-19
- 此时,OP 中的执行没有问题,可以下载多个股票。
- Version: 0.9.0 Date: July 10, 2020
- GitHub: pydata / pandas-datareader
tickers = ['msft', 'aapl', 'twtr', 'intc', 'tsm', 'goog', 'amzn', 'fb', 'nvda']
df = pdr.DataReader(tickers, data_source='yahoo', start='2017-01-01', end='2020-09-28')
原答案
如果您通读 Pandas DataReader 的 documentation,他们会立即发布对多个数据源 API 的折旧,其中之一是 Yahoo!金融.
v0.6.0 (January 24, 2018)
Immediate deprecation of Yahoo!, Google Options and Quotes and EDGAR. The end points behind these APIs have radically changed and the existing readers require complete rewrites. In the case of most Yahoo! data the endpoints have been removed. PDR would like to restore these features, and pull requests are welcome.
这可能是导致您出现 IndexError
(或任何其他通常 none-existant 错误)的罪魁祸首。
但是,还有另一个 Python 软件包,其目标是修复对 Yahoo! 的支持。 Pandas DataReader 的财务,您可以在此处找到该软件包:
https://pypi.python.org/pypi/fix-yahoo-finance
根据他们的文档:
Yahoo! finance has decommissioned their historical data API, causing many programs that relied on it to stop working.
fix-yahoo-finance offers a temporary fix to the problem by scraping the data from Yahoo! finance using and return a Pandas DataFrame/Panel in the same format as pandas_datareader’s
get_data_yahoo()
.By basically “hijacking”
pandas_datareader.data.get_data_yahoo()
method, fix-yahoo-finance’s implantation is easy and only requires to importfix_yahoo_finance
into your code.
您需要添加的是:
from pandas_datareader import data as pdr
import fix_yahoo_finance as yf
yf.pdr_override()
stocks = ["stock1","stock2", ...]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
f = pdr.get_data_yahoo(stocks, start=start, end=end)
甚至不需要 Pandas DataReader:
import fix_yahoo_finance as yf
stocks = ["stock1","stock2", ...]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2018,3,1)
data = yf.download(stocks, start=start, end=end)
您可以使用带有 pandas 的新 Python YahooFinancials 模块来执行此操作。 YahooFinancials 构建良好,通过散列出每个 Yahoo Finance 网页中存在的数据存储对象来获取数据,因此它速度很快,并且不依赖于旧的已停产 api,也不依赖于像爬虫那样的网络驱动程序。数据以 JSON 形式返回,您可以通过传递 stock/index 代码列表来一次提取任意数量的股票,以初始化 YahooFinancials Class。
$ pip 安装 yahoofinancials
用法示例:
from yahoofinancials import YahooFinancials
import pandas as pd
# Select Tickers and stock history dates
ticker = 'AAPL'
ticker2 = 'MSFT'
ticker3 = 'INTC'
index = '^NDX'
freq = 'daily'
start_date = '2012-10-01'
end_date = '2017-10-01'
# Function to clean data extracts
def clean_stock_data(stock_data_list):
new_list = []
for rec in stock_data_list:
if 'type' not in rec.keys():
new_list.append(rec)
return new_list
# Construct yahoo financials objects for data extraction
aapl_financials = YahooFinancials(ticker)
mfst_financials = YahooFinancials(ticker2)
intl_financials = YahooFinancials(ticker3)
index_financials = YahooFinancials(index)
# Clean returned stock history data and remove dividend events from price history
daily_aapl_data = clean_stock_data(aapl_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker]['prices'])
daily_msft_data = clean_stock_data(mfst_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker2]['prices'])
daily_intl_data = clean_stock_data(intl_financials
.get_historical_stock_data(start_date, end_date, freq)[ticker3]['prices'])
daily_index_data = index_financials.get_historical_stock_data(start_date, end_date, freq)[index]['prices']
stock_hist_data_list = [{'NDX': daily_index_data}, {'AAPL': daily_aapl_data}, {'MSFT': daily_msft_data},
{'INTL': daily_intl_data}]
# Function to construct data frame based on a stock and it's market index
def build_data_frame(data_list1, data_list2, data_list3, data_list4):
data_dict = {}
i = 0
for list_item in data_list2:
if 'type' not in list_item.keys():
data_dict.update({list_item['formatted_date']: {'NDX': data_list1[i]['close'], 'AAPL': list_item['close'],
'MSFT': data_list3[i]['close'],
'INTL': data_list4[i]['close']}})
i += 1
tseries = pd.to_datetime(list(data_dict.keys()))
df = pd.DataFrame(data=list(data_dict.values()), index=tseries,
columns=['NDX', 'AAPL', 'MSFT', 'INTL']).sort_index()
return df
一次多个股票数据示例(returns 每个代码的 JSON 个对象列表):
from yahoofinancials import YahooFinancials
tech_stocks = ['AAPL', 'MSFT', 'INTC']
bank_stocks = ['WFC', 'BAC', 'C']
yahoo_financials_tech = YahooFinancials(tech_stocks)
yahoo_financials_banks = YahooFinancials(bank_stocks)
tech_cash_flow_data_an = yahoo_financials_tech.get_financial_stmts('annual', 'cash')
bank_cash_flow_data_an = yahoo_financials_banks.get_financial_stmts('annual', 'cash')
banks_net_ebit = yahoo_financials_banks.get_ebit()
tech_stock_price_data = tech_cash_flow_data.get_stock_price_data()
daily_bank_stock_prices = yahoo_financials_banks.get_historical_stock_data('2008-09-15', '2017-09-15', 'daily')
JSON 输出示例:
代码:
yahoo_financials = YahooFinancials('WFC')
print(yahoo_financials.get_historical_stock_data("2017-09-10", "2017-10-10", "monthly"))
JSON Return:
{
"WFC": {
"prices": [
{
"volume": 260271600,
"formatted_date": "2017-09-30",
"high": 55.77000045776367,
"adjclose": 54.91999816894531,
"low": 52.84000015258789,
"date": 1506830400,
"close": 54.91999816894531,
"open": 55.15999984741211
}
],
"eventsData": [],
"firstTradeDate": {
"date": 76233600,
"formatted_date": "1972-06-01"
},
"isPending": false,
"timeZone": {
"gmtOffset": -14400
},
"id": "1mo15050196001507611600"
}
}
yahoo_finance已经不能用了,因为雅虎改格式了,fix_yahoo_finance可以下载数据了。但是要解析你需要其他库,这里是简单的工作示例:
import numpy as np #python library for scientific computing
import pandas as pd #python library for data manipulation and analysis
import matplotlib.pyplot as plt #python library for charting
import fix_yahoo_finance as yf #python library to scrape data from yahoo finance
from pandas_datareader import data as pdr #extract data from internet sources into pandas data frame
yf.pdr_override()
data = pdr.get_data_yahoo(‘^DJI’, start=”2006–01–01")
data2 = pdr.get_data_yahoo(“MSFT”, start=”2006–01–01")
data3 = pdr.get_data_yahoo(“AAPL”, start=”2006–01–01")
data4 = pdr.get_data_yahoo(“BB.TO”, start=”2006–01–01")
ax = (data[‘Close’] / data[‘Close’].iloc[0] * 100).plot(figsize=(15, 6))
(data2[‘Close’] / data2[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
(data3[‘Close’] / data3[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
(data4[‘Close’] / data5[‘Close’].iloc[0] * 100).plot(ax=ax, figsize=(15,6))
plt.legend([‘Dow Jones’, ‘Microsoft’, ‘Apple’, ‘Blackberry’], loc=’upper left’)
plt.show()
试试这个简单的代码
watchlist=["stock1","stock2".......]
closing_price=pd.DataFrame()
symbols=[]
for i in watchlist:
Result=wb.DataReader(i,start='05-1-20', end='05-20-20',data_source='yahoo')
closing_price=closing_price.append(Result)
symbols.append(i)
print("Generating Closing price for",i)
closing_price["SYMBOL"]=symbols
print("closing_price"
这对我有用。
assets = ['TSLA', 'MSFT', 'FB']
yahoo_financials = YahooFinancials(assets)
data = yahoo_financials.get_historical_price_data(start_date='2019-01-01',
end_date='2019-12-31',
time_interval='weekly')
prices_df = pd.DataFrame({
a: {x['formatted_date']: x['adjclose'] for x in data[a]['prices']} for a in assets})
prices_df
结果: