使用 multiprocessing -Pool- 和 -sklearn-,代码运行但核心不显示任何工作

Using multiprocessing -Pool- with -sklearn-, code runs but cores dont show any work

我正在尝试对大约 31 000 行和 1000 列进行一些机器学习。这需要很长时间,所以我认为我可以并行化这项工作,所以我把它变成了一个函数,并尝试在我的 Windows 10 和 jupyter notebook 上使用该工具。但它只是工作,当我在任务管理器上查看我的核心时,它们不工作。代码有问题还是不支持?

from sklearn.model_selection import train_test_split
X_dev, X_test, y_dev, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

from sklearn.model_selection import KFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import Imputer
from sklearn.metrics import accuracy_score
from multiprocessing import Pool
from datetime import datetime as dt

def tree_paralel(x):
    tree = DecisionTreeClassifier(criterion="gini", max_depth= x, random_state=1)  
    accuracy_ = []
    for train_idx, val_idx in kfolds.split(X_dev, y_dev):

        X_train, y_train, = X_dev.iloc[train_idx], y_dev.iloc[train_idx]
        X_val, y_val = X_dev.iloc[val_idx], y_dev.iloc[val_idx] 

        X_train = pd.DataFrame(im.fit_transform(X_train),index = X_train.index)
        X_val = pd.DataFrame(im.transform(X_val), index = X_val.index)
        tree.fit(X_train, y_train)
        y_pred = tree.predict(X_val)
        accuracy_.append(accuracy_score(y_val, y_pred))
    print("This was the "+str(x)+" iteration", (dt.now() - start).total_seconds())
    return accuracy_

然后使用多处理工具:

kfolds = KFold(n_splits=10)
accuracy = []
im = Imputer()

p = Pool(5)

input_ = range(1,11)
output_ = []
start = dt.now()
for result in p.imap(tree_paralel, input_):
    output_.append(result)
p.close()
print("Time:", (dt.now() - start).total_seconds())

这是使用交互式 python 时的一个已知问题。
引用 multiprocessing 文档中的 the note from Using a pool of workers 部分:

Note: Functionality within this package requires that the __ main__ module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.pool.Pool examples will not work in the interactive interpreter.

另见 multiprocessing Programming Guidelines

顺便说一句,我没有得到你需要用你的代码完成什么。使用 GridSearchCVn_jobs=5 不能解决您的问题(并大大简化您的代码)吗?