带有嵌入式 Bokeh 服务器应用程序的 Flask 中的代码 503 通过 requests.get() 获取 jsonified 数据

Code 503 in Flask with Embedded Bokeh Server App fetching jsonified data through requests.get()

我正在参数化我的散景应用程序,方法是让我的 Flask 应用程序通过专用于对通过查询字符串参数传递的请求数据进行 jsonify 处理的路由公开模型数据。我知道数据发送路由是有效的,因为当我将它用作 url 到 AjaxDataSource 时,我得到了预期的数据。但是,当我尝试使用 requests.get api 进行等效操作时,我得到一个 503 响应代码,这让我觉得我在这里违反了一些基本原则,但我有限的 webdev 经验无法完全理解。我做错了什么或违反了什么?

我实际上需要比 AjaxDataSource 提供的具有柱状限制的更多数据检索灵活性。我希望依靠 requests 模块通过序列化和反序列化 Json.

来传递任意 class 实例以及不存在的实例

这里是我演示失败的最小示例 flask_embed.html...

import requests
from flask import Flask, jsonify, render_template
import pandas
from tornado.ioloop import IOLoop

from bokeh.application          import Application
from bokeh.application.handlers import FunctionHandler
from bokeh.embed                import server_document
from bokeh.layouts              import column
from bokeh.models               import AjaxDataSource,ColumnDataSource
from bokeh.plotting             import figure
from bokeh.server.server        import Server

flask_app = Flask(__name__)

# Populate some model maintained by the flask application
modelDf = pandas.DataFrame()
nData = 100
modelDf[ 'c1_x' ] = range(nData)
modelDf[ 'c1_y' ] = [ x*x for x in range(nData) ]
modelDf[ 'c2_x' ] = range(nData)
modelDf[ 'c2_y' ] = [ 2*x for x in range(nData) ]

def modify_doc1(doc):
    # get colum name from query string
    args      = doc.session_context.request.arguments
    paramName = str( args['colName'][0].decode('utf-8') )

    # get model data from Flask
    url    = "http://localhost:8080/sendModelData/%s" % paramName 
    source = AjaxDataSource( data             = dict( x=[] , y=[] ) ,
                            data_url         = url       ,
                            polling_interval = 5000      ,
                            mode             = 'replace' ,
                            method           = 'GET'     )
    # plot the model data
    plot = figure( )
    plot.circle( 'x' , 'y' , source=source , size=2 )
    doc.add_root(column(plot))

def modify_doc2(doc):
    # get column name from query string
    args    = doc.session_context.request.arguments
    colName = str( args['colName'][0].decode('utf-8') )

    # get model data from Flask
    url = "http://localhost:8080/sendModelData/%s" % colName
    #pdb.set_trace()
    res = requests.get( url , timeout=None , verify=False )
    print( "CODE %s" % res.status_code )
    print( "ENCODING %s" % res.encoding )
    print( "TEXT %s" % res.text )
    data = res.json()

    # plot the model data
    plot = figure()
    plot.circle( 'x' , 'y' , source=data , size=2 )
    doc.add_root(column(plot))


bokeh_app1 = Application(FunctionHandler(modify_doc1))
bokeh_app2 = Application(FunctionHandler(modify_doc2))

io_loop = IOLoop.current()

server = Server({'/bkapp1': bokeh_app1 , '/bkapp2' : bokeh_app2 }, io_loop=io_loop, allow_websocket_origin=["localhost:8080"])
server.start()

@flask_app.route('/', methods=['GET'] )
def index():
    res =  "<table>"
    res += "<tr><td><a href=\"http://localhost:8080/app1/c1\">APP1 C1</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/app1/c2\">APP1 C2</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/app2/c1\">APP2 C1</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/app2/c2\">APP2 C2</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/sendModelData/c1\">DATA C1</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/sendModelData/c2\">DATA C2</a></td></tr>"
    res += "</table>"
    return res

@flask_app.route( '/app1/<colName>' , methods=['GET'] )
def bkapp1_page( colName ) :
    script = server_document( url='http://localhost:5006/bkapp1' , arguments={'colName' : colName } )
    return render_template("embed.html", script=script)

@flask_app.route( '/app2/<colName>' , methods=['GET'] )
def bkapp2_page( colName ) :
    script = server_document( url='http://localhost:5006/bkapp2', arguments={'colName' : colName } )
    return render_template("embed.html", script=script)

@flask_app.route('/sendModelData/<colName>' , methods=['GET'] )
def sendModelData( colName ) :
    x = modelDf[ colName + "_x" ].tolist()
    y = modelDf[ colName + "_y" ].tolist()
    return jsonify( x=x , y=y )

if __name__ == '__main__':
    from tornado.httpserver import HTTPServer
    from tornado.wsgi import WSGIContainer
    from bokeh.util.browser import view

    print('Opening Flask app with embedded Bokeh application on http://localhost:8080/')

    # This uses Tornado to server the WSGI app that flask provides. Presumably the IOLoop
    # could also be started in a thread, and Flask could server its own app directly
    http_server = HTTPServer(WSGIContainer(flask_app))
    http_server.listen(8080)

    io_loop.add_callback(view, "http://localhost:8080/")
    io_loop.start()

这是呈现的页面...

这是一些调试输出...

C:\TestApp>python flask_embedJSONRoute.py
Opening Flask app with embedded Bokeh application on http://localhost:8080/
> C:\TestApp\flask_embedjsonroute.py(52)modify_doc2()
-> res = requests.get( url , timeout=None , verify=False )
(Pdb) n
> C:\TestApp\flask_embedjsonroute.py(53)modify_doc2()
-> print( "CODE %s" % res.status_code )
(Pdb) n
CODE 503
> C:\TestApp\flask_embedjsonroute.py(54)modify_doc2()
-> print( "ENCODING %s" % res.encoding )
(Pdb) n
ENCODING utf-8
> C:\TestApp\flask_embedjsonroute.py(55)modify_doc2()
-> print( "TEXT %s" % res.text )
(Pdb) n
TEXT
> C:\TestApp\flask_embedjsonroute.py(56)modify_doc2()
-> data = res.json()
(Pdb)

  File "C:\Anaconda3\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

这似乎不是 Bokeh 本身的问题,而是 运行 Flask 应用服务器中线程和阻塞的问题。

除了 Bokeh 之外,它完全可以重现...

import requests
from flask import Flask, jsonify, request
import pandas
import pdb

flask_app = Flask(__name__)

# Populate some model maintained by the flask application
modelDf = pandas.DataFrame()
nData = 100
modelDf[ 'c1_x' ] = range(nData)
modelDf[ 'c1_y' ] = [ x*x for x in range(nData) ]
modelDf[ 'c2_x' ] = range(nData)
modelDf[ 'c2_y' ] = [ 2*x for x in range(nData) ]

@flask_app.route('/', methods=['GET'] )
def index():
    res =  "<table>"
    res += "<tr><td><a href=\"http://localhost:8080/sendModelData/c1\">SEND C1</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/sendModelData/c2\">SEND C2</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/RequestsOverFlaskNoProxy?colName=c1\">REQUEST OVER FLASK NO PROXY C1</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/RequestsOverFlaskNoProxy?colName=c2\">REQUEST OVER FLASK NO PROXY C2</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/RequestsOverFlask?colName=c1\">REQUEST OVER FLASK C1</a></td></tr>"
    res += "<tr><td><a href=\"http://localhost:8080/RequestsOverFlask?colName=c2\">REQUEST OVER FLASK C2</a></td></tr>"
    res += "</table>"   
    return res

@flask_app.route('/RequestsOverFlaskNoProxy')
def requestsOverFlaskNoProxy() :
    print("RequestsOverFlaskNoProxy")
    # get column name from query string
    colName = request.args.get('colName')

    # get model data from Flask
    url = "http://localhost:8080/sendModelData/%s" % colName

    print("Get data from %s" % url )
    session = requests.Session()
    session.trust_env = False
    res = session.get( url , timeout=5000 , verify=False )
    print( "CODE %s" % res.status_code )
    print( "ENCODING %s" % res.encoding )
    print( "TEXT %s" % res.text )
    data = res.json()
    return data

@flask_app.route('/RequestsOverFlask')
def requestsOverFlask() :
    # get column name from query string
    colName = request.args.get('colName')

    # get model data from Flask
    url = "http://localhost:8080/sendModelData/%s" % colName
    res = requests.get( url , timeout=None , verify=False )
    print( "CODE %s" % res.status_code )
    print( "ENCODING %s" % res.encoding )
    print( "TEXT %s" % res.text )
    data = res.json()
    return data

@flask_app.route('/sendModelData/<colName>' , methods=['GET'] )
def sendModelData( colName ) :
    x = modelDf[ colName + "_x" ].tolist()
    y = modelDf[ colName + "_y" ].tolist()
    return jsonify( x=x , y=y )

if __name__ == '__main__':
    print('Opening Flask app on http://localhost:8080/')

    # THIS DOES NOT WORK
    #flask_app.run( host='0.0.0.0' , port=8080 , debug=True )

    # THIS WORKS
    flask_app.run( host='0.0.0.0' , port=8080 , debug=True , threaded=True ) 

从屏幕截图中可以看出,直接从 sendModelData 提供数据会适当地呈现 JSon,但是当通过 requests.get 方法获取时会由于 503 代码而产生异常正如 Python 控制台中所报告的那样。

如果我进行同样的尝试,试图消除我通过环境变量启用的 proxies 的影响,但这种方法永远不会完成,请求会使浏览器无限期地旋转。

想想看,作为中间人使用请求可能是完全没有必要的,我应该能够只获取 json 字符串并自己反序列化它。好吧,在我的实际代码中,这将在这个设置中起作用,Bokeh 渲染是在与 Flask 应用程序完全不同的 python 模块中完成的,所以这些功能甚至不可用,除非我打乱应用程序的分层。

编辑 事实证明,我违反的根本是 Flask 的开发环境...

You are running your WSGI app with the Flask test server, which by default uses a single thread to handle requests. So when your one request thread tries to call back into the same server, it is still busy trying to handle that one request.

那么问题就变成了如何在原始的 Bokeh 示例中应用这种 threaded=True 技术?由于 flask_embed.py 示例对 Tornado WSGI 服务器的依赖,这可能是不可能的,从这个 question 表明 Tornado 在设计上是单线程的。 鉴于上述发现,一个更尖锐的问题是 AjaxDataSource 如何一起避免 requests 模块面临的这些线程问题?


更新 关于 Bokeh 和 Tornado 耦合的更多背景...

53:05 so they're actually are not very many, the question is about the dependencies for Bokeh and the Bokeh server. The new Bokeh server is built on tornado and that's pretty much the main dependency is it uses tornado. Beyond that there's not very many dependencies, runtime dependencies, for Bokeh. pandas is an optional dependency for Bokeh.charts. There's other dependencies, you know numpy is used. But there's only, the list of dependencies I think is six or seven. We've tried to pare it down greatly over the years and so, but the main dependency of the server is tornado. Intro to Data Visualization with Bokeh - Part 1 - Strata Hadoop San Jose 2016