抓取 Tradingview 图表的 boolstates
Scraping boolstates of Tradingview Chart
我正在尝试用我自己的图表抓取 Tradingview 网页以读取 boolstates。
这就是我的意思
用这个HTML网站代码
我正在服务器上使用 Debian/Linux 并使用 Python 进行编程。我尝试使用 BeautifulSoup 阅读页面,发现 BeautifulSoup 不能 运行 JavaScript 因此无法显示 HTML 中的所有内容它。
这段代码只输出括号[]。所以它没有找到我正在搜索的class
import requests
import soupsieve
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
url = 'https://de.tradingview.com/chart/zDAFlgZJ/#'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
output = soup.find_all('div', attrs={'class':'valueValue-3kA0oJs5'})
print(output)
之后,我使用 this video
在 PyQt5 上进行了尝试
我将视频的脚本更改为 PyQt5,但无法将代码引入 运行。
该脚本输出:
qt.qpa.screen: QXcbConnection: Could not connect to display :99
Could not connect to any X display.
但我没有屏幕只有终端。
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEnginePage as QWebPage
import bs4 as bs
import urllib.request
import os
class Client(QWebPage):
def __init__(self, url) :
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self.on_page_load)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def on_page_load(self) :
self.app.quit()
url = 'https://pythonprogramming.net/parsememcparseface/'
client_response = Client(url)
source = client_response.mainFrame().toHtml()
soup = bs.BeautifulSoup(source, 'lxml')
js_test = soup.find('p', class_='jstest')
print(js_test.text)
```
之后,我用这个 instruction 与 Selenium 和 Chromedriver 进行了尝试。但是它在启动后停止安装 headless.sh:
./start_headless.sh: command not found
所以我手动将它粘贴到终端并尝试启动 demo.py
但我又遇到了错误。
用 python 2.7
Traceback (most recent call last):
File "demo.py", line 3, in <module>
from pyvirtualdisplay import Display
File "/usr/local/lib/python2.7/dist-packages/pyvirtualdisplay/__init__.py", line 4, in <module>
from pyvirtualdisplay.display import Display
File "/usr/local/lib/python2.7/dist-packages/pyvirtualdisplay/display.py", line 26
backend: Optional[str] = None,
^
SyntaxError: invalid syntax
与python3.7
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 169, in start
cmd, stdout=stdout, stderr=stderr, cwd=self.cwd, env=self.env,
File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'Xvfb': 'Xvfb'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "demo.py", line 6, in <module>
display = Display(visible=0, size=(800, 600))
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/display.py", line 63, in __init__
**kwargs
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/xvfb.py", line 50, in __init__
manage_global_env=manage_global_env,
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/abstractdisplay.py", line 88, in __init__
helptext = get_helptext(program)
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/util.py", line 10, in get_helptext
p.call()
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 141, in call
self.start().wait(timeout=timeout)
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 174, in start
raise EasyProcessError(self, "start error")
easyprocess.EasyProcessError: start error <EasyProcess cmd_param=['Xvfb', '-help'] cmd=['Xvfb', '-help'] oserror=[Errno 2] No such file or directory: 'Xvfb': 'Xvfb' return_code=None stdout="None" stderr="None" timeout_happened=False>
我也用websocket试过,只能从标准图表中读出数据,这里就不提了。
有谁知道我该如何解决这个最初的问题?
可能比较重,但您是否考虑过使用 Selenium 来实现?您将能够 运行 完整的浏览器。如果我没记错的话,你仍然可以使用 BeautifulSoup。
至于路线,您可能会找到通过适当 API 提供该信息的经纪人,这显然是理想的情况。我想到了盈透证券。
我想通了如何从 JS 中过滤掉数据。对于想要创建类似功能的人来说,这里是工作脚本。使用 request_html:
from requests_html import HTMLSession
session = HTMLSession()
url = 'YOUR WEBSITE'
r = session.get(url)
r.html.render()
for item in r.html.xpath("//*[contains(@class,'CLASS NAME')]"):
print(item.text)
我正在尝试用我自己的图表抓取 Tradingview 网页以读取 boolstates。
这就是我的意思
用这个HTML网站代码
我正在服务器上使用 Debian/Linux 并使用 Python 进行编程。我尝试使用 BeautifulSoup 阅读页面,发现 BeautifulSoup 不能 运行 JavaScript 因此无法显示 HTML 中的所有内容它。
这段代码只输出括号[]。所以它没有找到我正在搜索的class
import requests
import soupsieve
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
url = 'https://de.tradingview.com/chart/zDAFlgZJ/#'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
output = soup.find_all('div', attrs={'class':'valueValue-3kA0oJs5'})
print(output)
之后,我使用 this video
在 PyQt5 上进行了尝试我将视频的脚本更改为 PyQt5,但无法将代码引入 运行。
该脚本输出:
qt.qpa.screen: QXcbConnection: Could not connect to display :99
Could not connect to any X display.
但我没有屏幕只有终端。
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEnginePage as QWebPage
import bs4 as bs
import urllib.request
import os
class Client(QWebPage):
def __init__(self, url) :
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self.on_page_load)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def on_page_load(self) :
self.app.quit()
url = 'https://pythonprogramming.net/parsememcparseface/'
client_response = Client(url)
source = client_response.mainFrame().toHtml()
soup = bs.BeautifulSoup(source, 'lxml')
js_test = soup.find('p', class_='jstest')
print(js_test.text)
```
之后,我用这个 instruction 与 Selenium 和 Chromedriver 进行了尝试。但是它在启动后停止安装 headless.sh:
./start_headless.sh: command not found
所以我手动将它粘贴到终端并尝试启动 demo.py
但我又遇到了错误。 用 python 2.7
Traceback (most recent call last):
File "demo.py", line 3, in <module>
from pyvirtualdisplay import Display
File "/usr/local/lib/python2.7/dist-packages/pyvirtualdisplay/__init__.py", line 4, in <module>
from pyvirtualdisplay.display import Display
File "/usr/local/lib/python2.7/dist-packages/pyvirtualdisplay/display.py", line 26
backend: Optional[str] = None,
^
SyntaxError: invalid syntax
与python3.7
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 169, in start
cmd, stdout=stdout, stderr=stderr, cwd=self.cwd, env=self.env,
File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'Xvfb': 'Xvfb'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "demo.py", line 6, in <module>
display = Display(visible=0, size=(800, 600))
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/display.py", line 63, in __init__
**kwargs
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/xvfb.py", line 50, in __init__
manage_global_env=manage_global_env,
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/abstractdisplay.py", line 88, in __init__
helptext = get_helptext(program)
File "/usr/local/lib/python3.7/dist-packages/pyvirtualdisplay/util.py", line 10, in get_helptext
p.call()
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 141, in call
self.start().wait(timeout=timeout)
File "/usr/local/lib/python3.7/dist-packages/easyprocess/__init__.py", line 174, in start
raise EasyProcessError(self, "start error")
easyprocess.EasyProcessError: start error <EasyProcess cmd_param=['Xvfb', '-help'] cmd=['Xvfb', '-help'] oserror=[Errno 2] No such file or directory: 'Xvfb': 'Xvfb' return_code=None stdout="None" stderr="None" timeout_happened=False>
我也用websocket试过,只能从标准图表中读出数据,这里就不提了。
有谁知道我该如何解决这个最初的问题?
可能比较重,但您是否考虑过使用 Selenium 来实现?您将能够 运行 完整的浏览器。如果我没记错的话,你仍然可以使用 BeautifulSoup。
至于路线,您可能会找到通过适当 API 提供该信息的经纪人,这显然是理想的情况。我想到了盈透证券。
我想通了如何从 JS 中过滤掉数据。对于想要创建类似功能的人来说,这里是工作脚本。使用 request_html:
from requests_html import HTMLSession
session = HTMLSession()
url = 'YOUR WEBSITE'
r = session.get(url)
r.html.render()
for item in r.html.xpath("//*[contains(@class,'CLASS NAME')]"):
print(item.text)