具有复杂功能的多处理
Multiprocessing with complex function
我想为 python 中的一个相当复杂的函数创建一个多进程:
我已经用像这样不太复杂的代码测试了这个函数:
from joblib import Parallel, delayed, parallel_backend
from joblib import load, dump
def print_hello(hallo, tschüß, rechnen,i):
print(i)
print(hallo[2])
print (tschüß)
rechnen = rechnen +i
hallo2 = pd.DataFrame(hallo)
hallo2.to_csv('./hallo'+str(i)+'.csv')
hallo1 = pd.read_csv('./hallo'+str(i)+'.csv')
return rechnen
hallo = ['hallo', 'hi', 'hey']
tschüß = 'tschüß'
with parallel_backend('threading'):
test = Parallel()(delayed(print_hello)(hallo, tschüß, rechnen, i) for i in range(10))
print(test)
这很好用。但是我得到以下错误代码:
joblib.my_exceptions.TransportableException: TransportableException
...
joblib.my_exceptions.JoblibTypeError: JoblibTypeError
...
TypeError: sum_row() missing 1 required positional argument: 'i'
当我想让我的复杂函数起作用时,它看起来像这样:
def sum_row(count_series, path, folder, files_1, files_2, files_3, path_raw, i):
print(i)
df1 = pd.read_csv(path_raw + files_1[i], sep=',', low_memory=False)
df2 = pd.read_csv(path_raw + files_2[i], sep=',', low_memory=False)
df3 = pd.read_csv(path_raw + files_3[i], sep=',', low_memory=False)
##do some operations with those files and create df_test
df_test.to_csv(path + folder + files_export[i])
return 0
with parallel_backend('threading'):
test = Parallel()(delayed(sum_row)(count_series, path, files_1, files_2, files_3, path_raw, i) for i in range(len(files_1)))
您收到错误的原因是您在调用该函数时缺少文件夹参数。
test = Parallel()(delayed(sum_row)(count_series, path, folder, files_1, files_2,
files_3, path_raw, i) for i in range(len(files_1)))
我想为 python 中的一个相当复杂的函数创建一个多进程: 我已经用像这样不太复杂的代码测试了这个函数:
from joblib import Parallel, delayed, parallel_backend
from joblib import load, dump
def print_hello(hallo, tschüß, rechnen,i):
print(i)
print(hallo[2])
print (tschüß)
rechnen = rechnen +i
hallo2 = pd.DataFrame(hallo)
hallo2.to_csv('./hallo'+str(i)+'.csv')
hallo1 = pd.read_csv('./hallo'+str(i)+'.csv')
return rechnen
hallo = ['hallo', 'hi', 'hey']
tschüß = 'tschüß'
with parallel_backend('threading'):
test = Parallel()(delayed(print_hello)(hallo, tschüß, rechnen, i) for i in range(10))
print(test)
这很好用。但是我得到以下错误代码:
joblib.my_exceptions.TransportableException: TransportableException
...
joblib.my_exceptions.JoblibTypeError: JoblibTypeError
...
TypeError: sum_row() missing 1 required positional argument: 'i'
当我想让我的复杂函数起作用时,它看起来像这样:
def sum_row(count_series, path, folder, files_1, files_2, files_3, path_raw, i):
print(i)
df1 = pd.read_csv(path_raw + files_1[i], sep=',', low_memory=False)
df2 = pd.read_csv(path_raw + files_2[i], sep=',', low_memory=False)
df3 = pd.read_csv(path_raw + files_3[i], sep=',', low_memory=False)
##do some operations with those files and create df_test
df_test.to_csv(path + folder + files_export[i])
return 0
with parallel_backend('threading'):
test = Parallel()(delayed(sum_row)(count_series, path, files_1, files_2, files_3, path_raw, i) for i in range(len(files_1)))
您收到错误的原因是您在调用该函数时缺少文件夹参数。
test = Parallel()(delayed(sum_row)(count_series, path, folder, files_1, files_2,
files_3, path_raw, i) for i in range(len(files_1)))