Python 多处理池错误:XGBoost 导入
Python Multiprocessing Pool Errors: XGBoost Importing
我正在尝试在 pandas DataFrame 上执行多处理池,以通过一系列多处理 API 请求更新一些数据。我 运行 遇到 XGBoost(我使用的库之一)的导入错误,只有当我启动多处理池时。
这是错误的摘录:
2020-05-02 20:52:58,338 - WARNING - Traceback (most recent call last):
2020-05-02 20:52:58,338 - WARNING - File "<string>", line 1, in <module>
2020-05-02 20:52:58,339 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
2020-05-02 20:52:58,339 - WARNING - exitcode = _main(fd)
2020-05-02 20:52:58,339 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
2020-05-02 20:52:58,340 - WARNING - prepare(preparation_data)
2020-05-02 20:52:58,340 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
2020-05-02 20:52:58,340 - WARNING - _fixup_main_from_path(data['init_main_from_path'])
2020-05-02 20:52:58,340 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
2020-05-02 20:52:58,341 - WARNING - run_name="__mp_main__")
2020-05-02 20:52:58,341 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
2020-05-02 20:52:58,341 - WARNING - pkg_name=pkg_name, script_name=fname)
2020-05-02 20:52:58,342 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
2020-05-02 20:52:58,342 - WARNING - mod_name, mod_spec, pkg_name, script_name)
2020-05-02 20:52:58,342 - WARNING - Traceback (most recent call last):
2020-05-02 20:52:58,342 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
2020-05-02 20:52:58,342 - WARNING - File "<string>", line 1, in <module>
2020-05-02 20:52:58,342 - WARNING - exec(code, run_globals)
2020-05-02 20:52:58,343 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\main.py", line 13, in <module>
2020-05-02 20:52:58,343 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
2020-05-02 20:52:58,343 - WARNING - from equity_finder.utils.model_loading_utils import save_model, load_model
2020-05-02 20:52:58,343 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\utils\model_loading_utils.py", line 5, in <module>
2020-05-02 20:52:58,343 - WARNING - exitcode = _main(fd)
2020-05-02 20:52:58,343 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
2020-05-02 20:52:58,343 - WARNING - from equity_finder.modelgenerator.model_container import ModelContainer
2020-05-02 20:52:58,344 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\modelgenerator\model_container.py", line 9, in <module>
2020-05-02 20:52:58,344 - WARNING - prepare(preparation_data)
2020-05-02 20:52:58,344 - WARNING - from xgboost import XGBClassifier
2020-05-02 20:52:58,344 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
2020-05-02 20:52:58,344 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\__init__.py", line 11, in <module>
2020-05-02 20:52:58,344 - WARNING - _fixup_main_from_path(data['init_main_from_path'])
2020-05-02 20:52:58,344 - WARNING - from .core import DMatrix, Booster
2020-05-02 20:52:58,344 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
2020-05-02 20:52:58,344 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 174, in <module>
2020-05-02 20:52:58,345 - WARNING - run_name="__mp_main__")
2020-05-02 20:52:58,345 - WARNING - _LIB = _load_lib()
2020-05-02 20:52:58,345 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
2020-05-02 20:52:58,345 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 134, in _load_lib
2020-05-02 20:52:58,345 - WARNING - lib_paths = find_lib_path()
2020-05-02 20:52:58,345 - WARNING - pkg_name=pkg_name, script_name=fname)
2020-05-02 20:52:58,345 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\libpath.py", line 50, in find_lib_path
2020-05-02 20:52:58,346 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
2020-05-02 20:52:58,346 - WARNING - 'List of candidates:\n' + ('\n'.join(dll_path)))
2020-05-02 20:52:58,346 - WARNING - mod_name, mod_spec, pkg_name, script_name)
2020-05-02 20:52:58,346 - WARNING - xgboost.libpath
2020-05-02 20:52:58,346 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
2020-05-02 20:52:58,346 - WARNING - .
2020-05-02 20:52:58,346 - WARNING - XGBoostLibraryNotFound
2020-05-02 20:52:58,346 - WARNING - :
2020-05-02 20:52:58,346 - WARNING - exec(code, run_globals)
2020-05-02 20:52:58,346 - WARNING - Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
2020-05-02 20:52:58,347 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\main.py", line 13, in <module>
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\xgboost.dll
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\../../lib/xgboost.dll
2020-05-02 20:52:58,347 - WARNING - from equity_finder.utils.model_loading_utils import save_model, load_model
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\./lib/xgboost.dll
2020-05-02 20:52:58,347 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\utils\model_loading_utils.py", line 5, in <module>
2020-05-02 20:52:58,347 - WARNING - C:\Users\Garett\AppData\Local\Programs\Python\Python37\xgboost\xgboost.dll
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\../../windows/x64/Release/xgboost.dll
2020-05-02 20:52:58,347 - WARNING - from equity_finder.modelgenerator.model_container import ModelContainer
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\./windows/x64/Release/xgboost.dll
2020-05-02 20:52:58,348 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\modelgenerator\model_container.py", line 9, in <module>
2020-05-02 20:52:58,348 - WARNING - from xgboost import XGBClassifier
2020-05-02 20:52:58,348 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\__init__.py", line 11, in <module>
2020-05-02 20:52:58,348 - WARNING - from .core import DMatrix, Booster
2020-05-02 20:52:58,348 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 174, in <module>
2020-05-02 20:52:58,349 - WARNING - _LIB = _load_lib()
2020-05-02 20:52:58,349 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 134, in _load_lib
2020-05-02 20:52:58,349 - WARNING - lib_paths = find_lib_path()
2020-05-02 20:52:58,349 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\libpath.py", line 50, in find_lib_path
2020-05-02 20:52:58,350 - WARNING - 'List of candidates:\n' + ('\n'.join(dll_path)))
2020-05-02 20:52:58,350 - WARNING - xgboost.libpath
2020-05-02 20:52:58,350 - WARNING - .
错误无限期地继续,然后我的记录器由于溢出而开始失败。我认为正在发生的事情是池中的每个进程都失败,然后下一个进程开始,再次失败。
我程序的入口点由if __name__ == '__main__':
保护
if __name__ == '__main__':
equity_finder: EquityFinder = EquityFinder()
equity_finder.equity_finder(sys.argv[1:], configuration.get())
这是执行并行化的函数。它作为子模块导入
from itertools import repeat
from multiprocessing import Pool, cpu_count
from typing import List, Callable
import numpy as np
import pandas as pd
from pandas import DataFrame
def parallelize_df_func(df: DataFrame, func: Callable, args: List):
"""
This function is used to parallelize a function that is to be applied over a DataFrame
Args:
df (DataFrame):
func (Callable):
args (List):
Returns:
"""
# Leaving one core free to not freeze the machine
num_cores = cpu_count() - 1
df_split = np.array_split(df, num_cores)
with Pool(cpu_count()) as pool:
df = pd.concat(pool.starmap(func, zip(df_split, *[repeat(arg) for arg in args])))
return df
这是我传递给 parallelize_df_func
的函数
def _replace_price_with_current(df: DataFrame, valid_price_dates: str) -> DataFrame:
def _vectorized_replacement(ticker: str, current_price: float, dates: str) -> float:
data = // API call redacted (I left API call out for privacy)
if data.empty:
return current_price
return data.iloc[0].close
return np.vectorize(_vectorized_replacement)(df['ticker'],
df['price'],
str(valid_price_dates))
有没有人遇到过这个问题?帮助将不胜感激。谢谢!
我最终从此继续前进,但后来确实注意到我在启动多处理池时遇到导入问题,因为导入未包含在 if __name__ == "__main__":
行中。
我将入口点提取到它自己的 python 文件中并执行了以下操作,解决了我的问题:
if __name__ == "__main__":
import sys
from equity_finder.pipeline import EquityFinder
from equity_finder.configurations import configuration
equity_finder: EquityFinder = EquityFinder()
equity_finder.equity_finder(sys.argv[1:], configuration.get())
我正在尝试在 pandas DataFrame 上执行多处理池,以通过一系列多处理 API 请求更新一些数据。我 运行 遇到 XGBoost(我使用的库之一)的导入错误,只有当我启动多处理池时。
这是错误的摘录:
2020-05-02 20:52:58,338 - WARNING - Traceback (most recent call last):
2020-05-02 20:52:58,338 - WARNING - File "<string>", line 1, in <module>
2020-05-02 20:52:58,339 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
2020-05-02 20:52:58,339 - WARNING - exitcode = _main(fd)
2020-05-02 20:52:58,339 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
2020-05-02 20:52:58,340 - WARNING - prepare(preparation_data)
2020-05-02 20:52:58,340 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
2020-05-02 20:52:58,340 - WARNING - _fixup_main_from_path(data['init_main_from_path'])
2020-05-02 20:52:58,340 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
2020-05-02 20:52:58,341 - WARNING - run_name="__mp_main__")
2020-05-02 20:52:58,341 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
2020-05-02 20:52:58,341 - WARNING - pkg_name=pkg_name, script_name=fname)
2020-05-02 20:52:58,342 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
2020-05-02 20:52:58,342 - WARNING - mod_name, mod_spec, pkg_name, script_name)
2020-05-02 20:52:58,342 - WARNING - Traceback (most recent call last):
2020-05-02 20:52:58,342 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
2020-05-02 20:52:58,342 - WARNING - File "<string>", line 1, in <module>
2020-05-02 20:52:58,342 - WARNING - exec(code, run_globals)
2020-05-02 20:52:58,343 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\main.py", line 13, in <module>
2020-05-02 20:52:58,343 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
2020-05-02 20:52:58,343 - WARNING - from equity_finder.utils.model_loading_utils import save_model, load_model
2020-05-02 20:52:58,343 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\utils\model_loading_utils.py", line 5, in <module>
2020-05-02 20:52:58,343 - WARNING - exitcode = _main(fd)
2020-05-02 20:52:58,343 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
2020-05-02 20:52:58,343 - WARNING - from equity_finder.modelgenerator.model_container import ModelContainer
2020-05-02 20:52:58,344 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\modelgenerator\model_container.py", line 9, in <module>
2020-05-02 20:52:58,344 - WARNING - prepare(preparation_data)
2020-05-02 20:52:58,344 - WARNING - from xgboost import XGBClassifier
2020-05-02 20:52:58,344 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
2020-05-02 20:52:58,344 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\__init__.py", line 11, in <module>
2020-05-02 20:52:58,344 - WARNING - _fixup_main_from_path(data['init_main_from_path'])
2020-05-02 20:52:58,344 - WARNING - from .core import DMatrix, Booster
2020-05-02 20:52:58,344 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
2020-05-02 20:52:58,344 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 174, in <module>
2020-05-02 20:52:58,345 - WARNING - run_name="__mp_main__")
2020-05-02 20:52:58,345 - WARNING - _LIB = _load_lib()
2020-05-02 20:52:58,345 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
2020-05-02 20:52:58,345 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 134, in _load_lib
2020-05-02 20:52:58,345 - WARNING - lib_paths = find_lib_path()
2020-05-02 20:52:58,345 - WARNING - pkg_name=pkg_name, script_name=fname)
2020-05-02 20:52:58,345 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\libpath.py", line 50, in find_lib_path
2020-05-02 20:52:58,346 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
2020-05-02 20:52:58,346 - WARNING - 'List of candidates:\n' + ('\n'.join(dll_path)))
2020-05-02 20:52:58,346 - WARNING - mod_name, mod_spec, pkg_name, script_name)
2020-05-02 20:52:58,346 - WARNING - xgboost.libpath
2020-05-02 20:52:58,346 - WARNING - File "C:\Users\Garett\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
2020-05-02 20:52:58,346 - WARNING - .
2020-05-02 20:52:58,346 - WARNING - XGBoostLibraryNotFound
2020-05-02 20:52:58,346 - WARNING - :
2020-05-02 20:52:58,346 - WARNING - exec(code, run_globals)
2020-05-02 20:52:58,346 - WARNING - Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
2020-05-02 20:52:58,347 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\main.py", line 13, in <module>
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\xgboost.dll
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\../../lib/xgboost.dll
2020-05-02 20:52:58,347 - WARNING - from equity_finder.utils.model_loading_utils import save_model, load_model
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\./lib/xgboost.dll
2020-05-02 20:52:58,347 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\utils\model_loading_utils.py", line 5, in <module>
2020-05-02 20:52:58,347 - WARNING - C:\Users\Garett\AppData\Local\Programs\Python\Python37\xgboost\xgboost.dll
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\../../windows/x64/Release/xgboost.dll
2020-05-02 20:52:58,347 - WARNING - from equity_finder.modelgenerator.model_container import ModelContainer
2020-05-02 20:52:58,347 - WARNING - D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\./windows/x64/Release/xgboost.dll
2020-05-02 20:52:58,348 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\equity_finder\modelgenerator\model_container.py", line 9, in <module>
2020-05-02 20:52:58,348 - WARNING - from xgboost import XGBClassifier
2020-05-02 20:52:58,348 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\__init__.py", line 11, in <module>
2020-05-02 20:52:58,348 - WARNING - from .core import DMatrix, Booster
2020-05-02 20:52:58,348 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 174, in <module>
2020-05-02 20:52:58,349 - WARNING - _LIB = _load_lib()
2020-05-02 20:52:58,349 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\core.py", line 134, in _load_lib
2020-05-02 20:52:58,349 - WARNING - lib_paths = find_lib_path()
2020-05-02 20:52:58,349 - WARNING - File "D:\GitSpace\Financial\Stock-Analysis-ML\.env\lib\site-packages\xgboost\libpath.py", line 50, in find_lib_path
2020-05-02 20:52:58,350 - WARNING - 'List of candidates:\n' + ('\n'.join(dll_path)))
2020-05-02 20:52:58,350 - WARNING - xgboost.libpath
2020-05-02 20:52:58,350 - WARNING - .
错误无限期地继续,然后我的记录器由于溢出而开始失败。我认为正在发生的事情是池中的每个进程都失败,然后下一个进程开始,再次失败。
我程序的入口点由if __name__ == '__main__':
if __name__ == '__main__':
equity_finder: EquityFinder = EquityFinder()
equity_finder.equity_finder(sys.argv[1:], configuration.get())
这是执行并行化的函数。它作为子模块导入
from itertools import repeat
from multiprocessing import Pool, cpu_count
from typing import List, Callable
import numpy as np
import pandas as pd
from pandas import DataFrame
def parallelize_df_func(df: DataFrame, func: Callable, args: List):
"""
This function is used to parallelize a function that is to be applied over a DataFrame
Args:
df (DataFrame):
func (Callable):
args (List):
Returns:
"""
# Leaving one core free to not freeze the machine
num_cores = cpu_count() - 1
df_split = np.array_split(df, num_cores)
with Pool(cpu_count()) as pool:
df = pd.concat(pool.starmap(func, zip(df_split, *[repeat(arg) for arg in args])))
return df
这是我传递给 parallelize_df_func
的函数def _replace_price_with_current(df: DataFrame, valid_price_dates: str) -> DataFrame:
def _vectorized_replacement(ticker: str, current_price: float, dates: str) -> float:
data = // API call redacted (I left API call out for privacy)
if data.empty:
return current_price
return data.iloc[0].close
return np.vectorize(_vectorized_replacement)(df['ticker'],
df['price'],
str(valid_price_dates))
有没有人遇到过这个问题?帮助将不胜感激。谢谢!
我最终从此继续前进,但后来确实注意到我在启动多处理池时遇到导入问题,因为导入未包含在 if __name__ == "__main__":
行中。
我将入口点提取到它自己的 python 文件中并执行了以下操作,解决了我的问题:
if __name__ == "__main__":
import sys
from equity_finder.pipeline import EquityFinder
from equity_finder.configurations import configuration
equity_finder: EquityFinder = EquityFinder()
equity_finder.equity_finder(sys.argv[1:], configuration.get())