Python 多处理池错误:XGBoost 导入

Python Multiprocessing Pool Errors: XGBoost Importing

我正在尝试在 pandas DataFrame 上执行多处理池,以通过一系列多处理 API 请求更新一些数据。我 运行 遇到 XGBoost(我使用的库之一)的导入错误,只有当我启动多处理池时。

这是错误的摘录:

2020-05-02 20:52:58,338 - WARNING - Traceback (most recent call last):
2020-05-02 20:52:58,338 - WARNING -   File "<string>", line 1, in <module>
2020-05-02 20:52:58,339 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
2020-05-02 20:52:58,339 - WARNING - exitcode = _main(fd)
2020-05-02 20:52:58,339 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
2020-05-02 20:52:58,340 - WARNING - prepare(preparation_data)
2020-05-02 20:52:58,340 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
2020-05-02 20:52:58,340 - WARNING - _fixup_main_from_path(data['init_main_from_path'])
2020-05-02 20:52:58,340 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
2020-05-02 20:52:58,341 - WARNING - run_name="__mp_main__")
2020-05-02 20:52:58,341 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
2020-05-02 20:52:58,341 - WARNING - pkg_name=pkg_name, script_name=fname)
2020-05-02 20:52:58,342 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
2020-05-02 20:52:58,342 - WARNING - mod_name, mod_spec, pkg_name, script_name)
2020-05-02 20:52:58,342 - WARNING - Traceback (most recent call last):
2020-05-02 20:52:58,342 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
2020-05-02 20:52:58,342 - WARNING -   File "<string>", line 1, in <module>
2020-05-02 20:52:58,342 - WARNING - exec(code, run_globals)
2020-05-02 20:52:58,343 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\main.py", line 13, in <module>
2020-05-02 20:52:58,343 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
2020-05-02 20:52:58,343 - WARNING - from equity_finder.utils.model_loading_utils import save_model, load_model
2020-05-02 20:52:58,343 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\utils\model_loading_utils.py", line 5, in <module>
2020-05-02 20:52:58,343 - WARNING - exitcode = _main(fd)
2020-05-02 20:52:58,343 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
2020-05-02 20:52:58,343 - WARNING - from equity_finder.modelgenerator.model_container import ModelContainer
2020-05-02 20:52:58,344 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\modelgenerator\model_container.py", line 9, in <module>
2020-05-02 20:52:58,344 - WARNING - prepare(preparation_data)
2020-05-02 20:52:58,344 - WARNING - from xgboost import XGBClassifier
2020-05-02 20:52:58,344 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
2020-05-02 20:52:58,344 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\__init__.py", line 11, in <module>
2020-05-02 20:52:58,344 - WARNING - _fixup_main_from_path(data['init_main_from_path'])
2020-05-02 20:52:58,344 - WARNING - from .core import DMatrix, Booster
2020-05-02 20:52:58,344 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
2020-05-02 20:52:58,344 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 174, in <module>
2020-05-02 20:52:58,345 - WARNING - run_name="__mp_main__")
2020-05-02 20:52:58,345 - WARNING - _LIB = _load_lib()
2020-05-02 20:52:58,345 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
2020-05-02 20:52:58,345 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 134, in _load_lib
2020-05-02 20:52:58,345 - WARNING - lib_paths = find_lib_path()
2020-05-02 20:52:58,345 - WARNING - pkg_name=pkg_name, script_name=fname)
2020-05-02 20:52:58,345 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\libpath.py", line 50, in find_lib_path
2020-05-02 20:52:58,346 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
2020-05-02 20:52:58,346 - WARNING - 'List of candidates:\n' + ('\n'.join(dll_path)))
2020-05-02 20:52:58,346 - WARNING - mod_name, mod_spec, pkg_name, script_name)
2020-05-02 20:52:58,346 - WARNING - xgboost.libpath
2020-05-02 20:52:58,346 - WARNING -   File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
2020-05-02 20:52:58,346 - WARNING - .
2020-05-02 20:52:58,346 - WARNING - XGBoostLibraryNotFound
2020-05-02 20:52:58,346 - WARNING - :
2020-05-02 20:52:58,346 - WARNING - exec(code, run_globals)
2020-05-02 20:52:58,346 - WARNING - Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
2020-05-02 20:52:58,347 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\main.py", line 13, in <module>
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\xgboost.dll
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\../../lib/xgboost.dll
2020-05-02 20:52:58,347 - WARNING - from equity_finder.utils.model_loading_utils import save_model, load_model
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\./lib/xgboost.dll
2020-05-02 20:52:58,347 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\utils\model_loading_utils.py", line 5, in <module>
2020-05-02 20:52:58,347 - WARNING - C:\Users\Garett\AppData\Local\Programs\Python\Python37\xgboost\xgboost.dll
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\../../windows/x64/Release/xgboost.dll
2020-05-02 20:52:58,347 - WARNING - from equity_finder.modelgenerator.model_container import ModelContainer
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\./windows/x64/Release/xgboost.dll
2020-05-02 20:52:58,348 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\modelgenerator\model_container.py", line 9, in <module>
2020-05-02 20:52:58,348 - WARNING - from xgboost import XGBClassifier
2020-05-02 20:52:58,348 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\__init__.py", line 11, in <module>
2020-05-02 20:52:58,348 - WARNING - from .core import DMatrix, Booster
2020-05-02 20:52:58,348 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 174, in <module>
2020-05-02 20:52:58,349 - WARNING - _LIB = _load_lib()
2020-05-02 20:52:58,349 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 134, in _load_lib
2020-05-02 20:52:58,349 - WARNING - lib_paths = find_lib_path()
2020-05-02 20:52:58,349 - WARNING -   File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\libpath.py", line 50, in find_lib_path
2020-05-02 20:52:58,350 - WARNING - 'List of candidates:\n' + ('\n'.join(dll_path)))
2020-05-02 20:52:58,350 - WARNING - xgboost.libpath
2020-05-02 20:52:58,350 - WARNING - .

错误无限期地继续,然后我的记录器由于溢出而开始失败。我认为正在发生的事情是池中的每个进程都失败,然后下一个进程开始,再次失败。

我程序的入口点由if __name__ == '__main__':

保护
if __name__ == '__main__':
equity_finder: EquityFinder = EquityFinder()
equity_finder.equity_finder(sys.argv[1:], configuration.get())

这是执行并行化的函数。它作为子模块导入

from itertools import repeat

from multiprocessing import Pool, cpu_count

from typing import List, Callable

import numpy as np
import pandas as pd
from pandas import DataFrame


def parallelize_df_func(df: DataFrame, func: Callable, args: List):
    """
    This function is used to parallelize a function that is to be applied over a DataFrame
    Args:
        df (DataFrame):
        func (Callable):
        args (List):

    Returns:

    """
    # Leaving one core free to not freeze the machine
    num_cores = cpu_count() - 1
    df_split = np.array_split(df, num_cores)
    with Pool(cpu_count()) as pool:
        df = pd.concat(pool.starmap(func, zip(df_split, *[repeat(arg) for arg in args])))
    return df

这是我传递给 parallelize_df_func

的函数
def _replace_price_with_current(df: DataFrame, valid_price_dates: str) -> DataFrame:
    def _vectorized_replacement(ticker: str, current_price: float, dates: str) -> float:
        data = // API call redacted (I left API call out for privacy)
        if data.empty:
            return current_price
        return data.iloc[0].close
    return np.vectorize(_vectorized_replacement)(df['ticker'],
                                                 df['price'],
                                                 str(valid_price_dates))

有没有人遇到过这个问题?帮助将不胜感激。谢谢!

我最终从此继续前进,但后来确实注意到我在启动多处理池时遇到导入问题,因为导入未包含在 if __name__ == "__main__": 行中。

我将入口点提取到它自己的 python 文件中并执行了以下操作,解决了我的问题:

if __name__ == "__main__":
    import sys

    from equity_finder.pipeline import EquityFinder
    from equity_finder.configurations import configuration

    equity_finder: EquityFinder = EquityFinder()
    equity_finder.equity_finder(sys.argv[1:], configuration.get())