如何从下拉列表中检索值列表
How to retrieve the list of values from a drop down list
我正在尝试检索 yahoo finance 上给定代码的可用期权到期列表。
例如在 https://finance.yahoo.com/quote/SPY/options
上使用 SPY 作为代码
到期列表在下拉列表中:
<div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4">
<select class="Fz(s)" data-reactid="5">
<option selected="" value="1576627200" data-reactid="6">December 18, 2019</option>
<option value="1576800000" data-reactid="7">December 20, 2019</option>
<option value="1577059200" data-reactid="8">December 23, 2019</option>
...
< / select >
< / div >
使用 div class 名称(或 select class 名称,但页面上似乎有几个),我得到了列表值作为一串串联的到期时间。
我的函数(我从主函数传递 ticker='SPY'):
def get_list_expiries(ticker):
browser = webdriver.Chrome()
options_url = "https://finance.yahoo.com/quote/" + str(ticker) + "/options"
browser.get(options_url)
html_source = browser.page_source
soup = BeautifulSoup(html_source, 'html.parser')
expiries_dt = []
for exp in soup.find_all(class_="Fl(start) Pend(18px) option-contract-control drop-down-selector"):
expiries_dt.append(exp.text)
browser.quit()
return expiries_dt
这会产生:
['December 18, 2019December 20, 2019December 23, 2019December 24, 2019December 27, 2019December 30, 2019...']
我知道我需要为此使用 selenium,但我不知道如何使用。结果始终是单个字符串的列表。理想情况下,我想要 return 两个列表:一个带有 unix 日期戳(选项值 =“1576627200”),另一个带有 'normal' 日期(即 18/12/2019)的列表。
任何帮助将不胜感激。
要提取 unix 日期戳 和 到期日期,您必须引入 WebDriverWait 并且您可以使用以下 :
代码块:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://finance.yahoo.com/quote/SPY/options')
select = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.option-contract-control.drop-down-selector>select"))))
print("Unix datestamp: ")
print([option.get_attribute("value") for option in select.options])
print("Dates: ")
print([option.get_attribute("innerHTML") for option in select.options])
控制台输出:
Unix datestamp:
['1576627200', '1576800000', '1577059200', '1577145600', '1577404800', '1577664000', '1577750400', '1578009600', '1578268800', '1578441600', '1578614400', '1578873600', '1579046400', '1579219200', '1579564800', '1579824000', '1580428800', '1582243200', '1584662400', '1585612800', '1587081600', '1589500800', '1592524800', '1593475200', '1594944000', '1600387200', '1601424000', '1602806400', '1605830400', '1606780800', '1608249600', '1610668800', '1616112000', '1623974400', '1631836800', '1639699200', '1642723200']
Dates:
['December 18, 2019', 'December 20, 2019', 'December 23, 2019', 'December 24, 2019', 'December 27, 2019', 'December 30, 2019', 'December 31, 2019', 'January 3, 2020', 'January 6, 2020', 'January 8, 2020', 'January 10, 2020', 'January 13, 2020', 'January 15, 2020', 'January 17, 2020', 'January 21, 2020', 'January 24, 2020', 'January 31, 2020', 'February 21, 2020', 'March 20, 2020', 'March 31, 2020', 'April 17, 2020', 'May 15, 2020', 'June 19, 2020', 'June 30, 2020', 'July 17, 2020', 'September 18, 2020', 'September 30, 2020', 'October 16, 2020', 'November 20, 2020', 'December 1, 2020', 'December 18, 2020', 'January 15, 2021', 'March 19, 2021', 'June 18, 2021', 'September 17, 2021', 'December 17, 2021', 'January 21, 2022']
尝试使用 SimplifiedDoc,这是一个用于提取的库
from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''<div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4">
<select class="Fz(s)" data-reactid="5">
<option selected="" value="1576627200" data-reactid="6">December 18, 2019</option>
<option value="1576800000" data-reactid="7">December 20, 2019</option>
<option value="1577059200" data-reactid="8">December 23, 2019</option>
...
</select>
</div>
'''
doc = SimplifiedDoc(html)
div = doc.getElementByClass('Fl(start) Pend(18px) option-contract-control drop-down-selector')
options = div.options # get all options
expiries_dt = [option.html for option in options]
print (expiries_dt) # ['December 18, 2019', 'December 20, 2019', 'December 23, 2019']
您至少不需要 selenium(对于大多数 Yahoo 财务信息来说,老实说,这太过分了)。您可以从响应文本中提取时间戳(使用 ast 将返回的列表的字符串表示形式转换为实际列表)并使用 datetime 模块转换为所需的日期格式。
import requests, re, ast
from datetime import datetime
r = requests.get('https://finance.yahoo.com/quote/SPY/options?guccounter=1')
p = re.compile(r'"expirationDates":(\[.*?\])')
timestamps = ast.literal_eval(p.findall(r.text)[0])
dates = [datetime.utcfromtimestamp(ts).strftime("%B %d, %Y") for ts in timestamps]
正则表达式解释:
日期时间转换:
- 请参阅 @jfs 的讨论,这是我最初看到
utcfromtimestamp
的地方
- strftime
我正在尝试检索 yahoo finance 上给定代码的可用期权到期列表。 例如在 https://finance.yahoo.com/quote/SPY/options
上使用 SPY 作为代码到期列表在下拉列表中:
<div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4">
<select class="Fz(s)" data-reactid="5">
<option selected="" value="1576627200" data-reactid="6">December 18, 2019</option>
<option value="1576800000" data-reactid="7">December 20, 2019</option>
<option value="1577059200" data-reactid="8">December 23, 2019</option>
...
< / select >
< / div >
使用 div class 名称(或 select class 名称,但页面上似乎有几个),我得到了列表值作为一串串联的到期时间。
我的函数(我从主函数传递 ticker='SPY'):
def get_list_expiries(ticker):
browser = webdriver.Chrome()
options_url = "https://finance.yahoo.com/quote/" + str(ticker) + "/options"
browser.get(options_url)
html_source = browser.page_source
soup = BeautifulSoup(html_source, 'html.parser')
expiries_dt = []
for exp in soup.find_all(class_="Fl(start) Pend(18px) option-contract-control drop-down-selector"):
expiries_dt.append(exp.text)
browser.quit()
return expiries_dt
这会产生:
['December 18, 2019December 20, 2019December 23, 2019December 24, 2019December 27, 2019December 30, 2019...']
我知道我需要为此使用 selenium,但我不知道如何使用。结果始终是单个字符串的列表。理想情况下,我想要 return 两个列表:一个带有 unix 日期戳(选项值 =“1576627200”),另一个带有 'normal' 日期(即 18/12/2019)的列表。
任何帮助将不胜感激。
要提取 unix 日期戳 和 到期日期,您必须引入 WebDriverWait 并且您可以使用以下
代码块:
from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import Select options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get('https://finance.yahoo.com/quote/SPY/options') select = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.option-contract-control.drop-down-selector>select")))) print("Unix datestamp: ") print([option.get_attribute("value") for option in select.options]) print("Dates: ") print([option.get_attribute("innerHTML") for option in select.options])
控制台输出:
Unix datestamp: ['1576627200', '1576800000', '1577059200', '1577145600', '1577404800', '1577664000', '1577750400', '1578009600', '1578268800', '1578441600', '1578614400', '1578873600', '1579046400', '1579219200', '1579564800', '1579824000', '1580428800', '1582243200', '1584662400', '1585612800', '1587081600', '1589500800', '1592524800', '1593475200', '1594944000', '1600387200', '1601424000', '1602806400', '1605830400', '1606780800', '1608249600', '1610668800', '1616112000', '1623974400', '1631836800', '1639699200', '1642723200'] Dates: ['December 18, 2019', 'December 20, 2019', 'December 23, 2019', 'December 24, 2019', 'December 27, 2019', 'December 30, 2019', 'December 31, 2019', 'January 3, 2020', 'January 6, 2020', 'January 8, 2020', 'January 10, 2020', 'January 13, 2020', 'January 15, 2020', 'January 17, 2020', 'January 21, 2020', 'January 24, 2020', 'January 31, 2020', 'February 21, 2020', 'March 20, 2020', 'March 31, 2020', 'April 17, 2020', 'May 15, 2020', 'June 19, 2020', 'June 30, 2020', 'July 17, 2020', 'September 18, 2020', 'September 30, 2020', 'October 16, 2020', 'November 20, 2020', 'December 1, 2020', 'December 18, 2020', 'January 15, 2021', 'March 19, 2021', 'June 18, 2021', 'September 17, 2021', 'December 17, 2021', 'January 21, 2022']
尝试使用 SimplifiedDoc,这是一个用于提取的库
from simplified_scrapy.simplified_doc import SimplifiedDoc
html='''<div class="Fl(start) Pend(18px) option-contract-control drop-down-selector" data-reactid="4">
<select class="Fz(s)" data-reactid="5">
<option selected="" value="1576627200" data-reactid="6">December 18, 2019</option>
<option value="1576800000" data-reactid="7">December 20, 2019</option>
<option value="1577059200" data-reactid="8">December 23, 2019</option>
...
</select>
</div>
'''
doc = SimplifiedDoc(html)
div = doc.getElementByClass('Fl(start) Pend(18px) option-contract-control drop-down-selector')
options = div.options # get all options
expiries_dt = [option.html for option in options]
print (expiries_dt) # ['December 18, 2019', 'December 20, 2019', 'December 23, 2019']
您至少不需要 selenium(对于大多数 Yahoo 财务信息来说,老实说,这太过分了)。您可以从响应文本中提取时间戳(使用 ast 将返回的列表的字符串表示形式转换为实际列表)并使用 datetime 模块转换为所需的日期格式。
import requests, re, ast
from datetime import datetime
r = requests.get('https://finance.yahoo.com/quote/SPY/options?guccounter=1')
p = re.compile(r'"expirationDates":(\[.*?\])')
timestamps = ast.literal_eval(p.findall(r.text)[0])
dates = [datetime.utcfromtimestamp(ts).strftime("%B %d, %Y") for ts in timestamps]
正则表达式解释:
日期时间转换:
- 请参阅 @jfs 的讨论,这是我最初看到
utcfromtimestamp
的地方 - strftime