Docker 带有 joblib 的 uwsgi-nginx-flask,无法找到本地函数,但可以在独立的 flask 中工作

Docker uwsgi-nginx-flask with joblib, unable to find local function, but works in standalone flask

当我尝试通过 joblib 在 docker 容器中加载预训练模型时出现以下错误。

web_1  | 2018-02-06 15:11:50,826 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
web_1  | 2018-02-06 15:11:50,828 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
web_1  | Traceback (most recent call last):
web_1  |   File "./app/main.py", line 23, in <module>
web_1  |     svm_detector_reloaded=joblib.load(filename);
web_1  |   File "/usr/local/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 578, in load
web_1  |     obj = _unpickle(fobj, filename, mmap_mode)
web_1  |   File "/usr/local/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 508, in _unpickle
web_1  |     obj = unpickler.load()
web_1  |   File "/usr/local/lib/python3.6/pickle.py", line 1050, in load
web_1  |     dispatch[key[0]](self)
web_1  |   File "/usr/local/lib/python3.6/pickle.py", line 1338, in load_global
web_1  |     klass = self.find_class(module, name)
web_1  |   File "/usr/local/lib/python3.6/pickle.py", line 1392, in find_class
web_1  |     return getattr(sys.modules[module], name)
web_1  | AttributeError: module '__main__' has no attribute 'split_into_lemmas'
web_1  | unable to load app 0 (mountpoint='') (callable not found or import error)
web_1  | *** no app loaded. going in full dynamic mode ***
web_1  | *** uWSGI is running in multiple interpreter mode ***

我的main.py长得像

from flask import Flask
from flask import request
from flask import jsonify
from textblob import TextBlob
import sklearn
import numpy as np
from sklearn.externals import joblib

app = Flask(__name__)

from .api.utils import split_into_lemmas as split_into_lemmas

def split_into_lemmas(message):
    message=message.lower()
    words = TextBlob(message).words
    # for each word, take its "base form" = lemma 
    return [word.lemma for word in words]

def tollower(message):
    return message.lower()

filename = '../../data/sms_spam_detector.pkl'
svm_detector_reloaded=joblib.load(filename);

text="Testing"
lowerText=tollower(text)

@app.route('/')
def hello():
    return tollower("Test Test ");

@app.route('/detect/')
def route_detect():
    SMS=request.args.get('SMS')
    if(SMS==None or SMS==''):
        SMS="Test";
    return tollower(SMS);
#    test=[SMS]
#    message=  ( svm_detector_reloaded.predict(test)[0])
#    return SMS+"    "+message;

if __name__ == "__main__":
    # Only for debugging while developing
    app.run(host='0.0.0.0')

基本上我从 tiangolo/uwsgi-nginx-flask 下载了 example-flask-package-python3.6.zip。添加了数据目录并修改了 docker 文件和 main.py。 main.py 粘贴在上面,docker 文件看起来像

FROM tiangolo/uwsgi-nginx-flask:python3.6
ENV LISTEN_PORT 8080

EXPOSE 8080 
RUN pip3 install numpy TextBlob scikit-learn scipy

COPY ./app /app
COPY ./data /data

然后我将预建模型(通过 joblib 存储)复制到新创建的数据目录中。整个代码工作正常,如果我直接 运行 像 python main.py 这样的代码,但不是在发出 docker-compose up 命令时,出现上述错误。如果我评论行 svm_detector_reloaded=joblib.load(filename);,docker 出现并且一切正常,除了机器学习部分。

基本上,定义的函数 split_into_lemmas 在 unpickled 模型中不可访问。

我在这里做错了什么?模型是按照 @ http://radimrehurek.com/data_science_python 中提到的步骤构建的。实际模型在第 6 步构建。

好的。我能够解决它。我从 3614379 得到了线索。我首先在模块(或 .py 文件)中创建函数 split_into_lemmas 并在训练时导入该模块,而不是将函数保留在主文件本身中。然后在我的 docker 实例中我也导入了相同的模块。它解决了这个问题。