从 pandas 系列和 csr 矩阵并行填充 ndarray

Question

目前正在使用 for 循环将 pandas 系列（category/object dtype）和 csr 矩阵（numpy）中的值填充到 ndarray，我希望加快速度

顺序for循环（有效），numba（不喜欢系列和字符串），joblib（比顺序循环慢），swifter.apply（慢得多，因为我必须使用pandas但它确实并行化了）

import pandas as pd
import numpy as np
from scipy.sparse import rand

nr_matches = 10**5
name_vector = pd.Series(pd.util.testing.rands_array(10, nr_matches))
matches = rand(nr_matches, 10, density = 0.2, format = 'csr')
non_zeros = matches.nonzero()
sparserows = non_zeros[0]
sparsecols = non_zeros[1]

left_side = np.empty([nr_matches], dtype = object)
right_side = np.empty([nr_matches], dtype = object)
similarity = np.zeros(nr_matches)

for index in range(0, nr_matches):
    left_side[index] = name_vector.iat[sparserows[index]]
    right_side[index] = name_vector.iat[sparsecols[index]]
    similarity[index] = matches.data[index]

没有错误消息，但是因为它使用一个线程，所以速度很慢！

Answer 1

如Divarak所述，切片直接工作

matches_df["left_side"] = name_vector.iloc[sparserows].values
matches_df["right_side"] = name_vector.iloc[sparsecols].values
matches_df["similarity"] = matches.data

从 pandas 系列和 csr 矩阵并行填充 ndarray

Parallelize populating ndarray from pandas series and csr matrix

python

numpy

sparse-matrix

pandas