ValueError: setting an array element with a sequence()

Question

Before downvoting this question and marked as duplicate, let me just explain the issue, i tried all the possible solutions with similar question here on stack, but none of them worked. i also checked, setting an array element with a sequence" error could be improved. #6584

所以我在 3 个不同的特征上训练随机森林分类器，所有特征都有不同的维度，但我将它们重塑为 (-1,1)，这适合训练 RF（随机森林）模型，但它保持一次又一次地给出同样的错误，因为我已经尝试了所有可能的事情，这里是我正在使用的特征函数列表，

here , am computing the color features by simply taking mean/average of images in different color spaces,here am working on RGB,LAB,HSV and GRAY image respectively, as from the code below i have flattened all the possible feature vector array, from different color spaces.

def extract_color_feature(rgb_roi, lab_roi, hsv_roi, gray_roi):
    avg_rgb_per_row = np.average(rgb_roi, axis=0)
    avg_rgb = np.average(avg_rgb_per_row, axis=0).flatten()  

    avg_lab_per_row = np.average(lab_roi, axis=0)
    avg_lab = np.average(avg_lab_per_row, axis=0).flatten()  

    h, s, _ = cv2.split(hsv_roi)
    h_avg = cv2.mean(h)
    s_avg = cv2.mean(s)
    avg_hs = np.hstack([h_avg, s_avg]).flatten() 

    lbp = extract_lbp(gray_roi).flatten()  

    avg_rgb = np.array(avg_rgb, dtype=np.float32).flatten()
    avg_lab = np.array(avg_lab, dtype=np.float32).flatten()
    avg_hs = np.array(avg_hs, dtype=np.float32).flatten()
    lbp = np.array(lbp, dtype=np.float32).flatten()

    avg_color = np.hstack([avg_rgb, avg_lab, avg_hs, lbp])


    return avg_color.flatten()

in the following function i only computed histogram values from different color spaces again RGB,LAB,HSV color spaces used. as every histogram here performed on single color channel, so depth of every histogram feature will always be 1.

def compute_hist_feature(rgb_seg, hsv_seg, lab_seg, mask):
    b, g, r = cv2.split(rgb_seg)
    h, s, v = cv2.split(hsv_seg)
    l, a, b = cv2.split(lab_seg)

    r_equ = cv2.equalizeHist(r)
    g_equ = cv2.equalizeHist(g)
    b_equ = cv2.equalizeHist(b)
    r_hist = cv2.calcHist([r_equ], [0], mask, [8],
                          [0, 256]).flatten()  
    g_hist = cv2.calcHist([g_equ], [0], mask, [8],
                          [0, 256]).flatten()  
    b_hist = cv2.calcHist([b_equ], [0], mask, [8],
                          [0, 256]).flatten()  

    l_hist = cv2.calcHist([l], [0], mask, [8],
                          [0, 256]).flatten()  
    a_hist = cv2.calcHist([a], [0], mask, [8],
                          [0, 256]).flatten()  
    bb_hist = cv2.calcHist([b], [0], mask, [8],
                           [0, 256]).flatten()  

    h_hist = cv2.calcHist([h], [0], mask,
                          [8], [0, 256]).flatten()  
    s_hist = cv2.calcHist([s], [0], mask,
                          [8], [0, 256]).flatten()  

    h_hist = np.array(h_hist, dtype=np.float32).flatten()
    r_hist = np.array(r_hist, dtype=np.float32).flatten()
    g_hist = np.array(g_hist, dtype=np.float32).flatten()
    b_hist = np.array(b_hist, dtype=np.float32).flatten()
    s_hist = np.array(s_hist, dtype=np.float32).flatten()
    l_hist = np.array(l_hist, dtype=np.float32).flatten()
    a_hist = np.array(a_hist, dtype=np.float32).flatten()
    bb_hist = np.array(bb_hist, dtype=np.float32).flatten()


    hist = np.hstack([r_hist, g_hist, b_hist, h_hist, s_hist, l_hist, a_hist, bb_hist])

    return hist.flatten()

and finally am using location features , by simply flattened down the (x,y) cordinate list to form a feature array whhich will represent location feautre respectively.

cords = [t[::-1] for t in clusters_.get(disc)]  # reversing the list of tuples

disc_pts = np.array(cords, dtype=np.int32)
loc_feat = np.array(cords, dtype=np.float32).flatten()

here initially the cords represents to a array with depth 2 coz every pixel have two cordinates so, i flattened it , to form a array with depth of 1.

最后我将所有三个特征堆叠起来形成单个特征向量，

feat_vec = np.hstack([loc_feat, color_feat, hist_feat]).flatten()

here i have manually cheked the elements in all three feature vectors, in order to confirm the dtype, dimensions of array are not ambiguous to trigger the error, but everything looks fine to me.

这是第一个，位置特征

[  82.  209.   82.  210.   83.  210.   82.  211.   83.  211.   82.  212.
   83.  212.   84.  212.   81.  213.   82.  213.   83.  213.   84.  213.
   81.  214.   82.  214.   83.  214.   84.  214.   81.  215.   82.  215.
   83.  215.   84.  215.   81.  216.   82.  216.   83.  216.   84.  216.
   81.  217.   82.  217.   83.  217.   84.  217.   81.  218.   82.  218.
   83.  218.   84.  218.   85.  218.   81.  219.   82.  219.   83.  219.
   84.  219.   85.  219.   81.  220.   82.  220.   83.  220.   84.  220.
   85.  220.   81.  221.   82.  221.   83.  221.   84.  221.   85.  221.
   81.  222.   82.  222.   83.  222.   84.  222.   85.  222.   86.  222.
   81.  223.   82.  223.   83.  223.   84.  223.   85.  223.   86.  223.
   81.  224.   82.  224.   83.  224.   84.  224.   85.  224.   86.  224.
   81.  225.   82.  225.   83.  225.   84.  225.   85.  225.   86.  225.
   87.  225.   81.  226.   82.  226.   83.  226.   84.  226.   85.  226.
   86.  226.   87.  226.   81.  227.   82.  227.   83.  227.   84.  227.
   85.  227.   86.  227.   87.  227.   82.  228.   83.  228.   84.  228.
   85.  228.   86.  228.   87.  228.   82.  229.   83.  229.   84.  229.
   85.  229.   86.  229.   87.  229.   82.  230.   83.  230.   84.  230.
   85.  230.   86.  230.   87.  230.   82.  231.   83.  231.   84.  231.
   85.  231.   86.  231.   87.  231.   82.  232.   83.  232.   84.  232.
   85.  232.   86.  232.   87.  232.   82.  233.   83.  233.   84.  233.
   85.  233.   86.  233.   87.  233.   88.  233.   83.  234.   84.  234.
   85.  234.   86.  234.   87.  234.   88.  234.   83.  235.   84.  235.
   85.  235.   86.  235.   87.  235.   88.  235.   83.  236.   84.  236.
   85.  236.   86.  236.   87.  236.   88.  236.   83.  237.   84.  237.
   85.  237.   86.  237.   87.  237.   88.  237.   84.  238.   85.  238.
   86.  238.   87.  238.   84.  239.   85.  239.   86.  239.   87.  239.
   84.  240.   85.  240.   86.  240.   87.  240.   84.  241.   85.  241.
   86.  241.   87.  241.   85.  242.   86.  242.   87.  242.   85.  243.
   86.  243.]

这是彩色矢量

[  3.35917592e-01   3.25945705e-01   3.25065553e-01   3.34438205e-01
   2.04288393e-01   1.97153553e-01   1.85440078e-01   0.00000000e+00
   0.00000000e+00   0.00000000e+00   1.32209742e-02   0.00000000e+00
   0.00000000e+00   0.00000000e+00   2.62172282e-04   3.93258437e-04
   1.31086141e-04   9.36329598e-05   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   9.98417616e-01   7.02247198e-04]

这是直方图特征向量

[   0.    0.    0.    0.    0.    0.    0.  169.    0.    0.    0.    0.
    0.    0.    0.  169.    0.  163.    6.    0.    0.    0.    0.    0.
    0.    0.    0.  169.    0.    0.    0.    0.  169.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.   29.   93.   47.
    0.    0.    0.    0.  169.    0.    0.    0.    0.    0.    0.  169.
    0.    0.    0.    0.]

因为可以看出所有三个数组的数据类型和维度都相同，但在使用 RF 或 SVC 分类器训练时仍然出现错误，当我不这样做时也是如此使用位置特征并仅使用颜色和直方图特征进行训练，则不会产生错误，并且训练和预测程序都可以正常工作。 但只有当所有三个特征叠加时出现错误。

当为 training.here 设置 RF 分类器时会抛出错误 _data 是先前计算的特征向量列表（~feat_vec~）。和 _labels 分别是每个数据（图像）样本的对应标签 1 或 0。

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(_data, _labels)

完整的错误追溯：

Traceback (most recent call last):
  File "~/openCV/saliency_detection/svm_train.py", line 59, in <module>
    model.fit(_data, _labels)
  File "/usr/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 247, in fit
    X = check_array(X, accept_sparse="csc", dtype=DTYPE)
  File "/usr/lib/python2.7/site-packages/sklearn/utils/validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.

Answer 1

错误很可能是由于尝试从列表或不同长度的数组创建数组引起的。

没有 dtype 下面创建一个 object dtype 数组；使用数字 dtype 会引发此错误。

In [33]: np.array([[1,2,3],[4,5,6],[7,8,9,10]])
Out[33]: 
array([list([1, 2, 3]), list([4, 5, 6]), list([7, 8, 9, 10])],
      dtype=object)
In [34]: np.array([[1,2,3],[4,5,6],[7,8,9,10]], dtype=int)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-677fde45dbde> in <module>()
----> 1 np.array([[1,2,3],[4,5,6],[7,8,9,10]], dtype=int)

ValueError: setting an array element with a sequence.

无法从 3 个不同长度的列表创建二维数值数组。

In [37]: np.array([[1,2,3],[4,5,6],[7,8,9]], dtype=int)
Out[37]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

在 traceback 变量名称更改中，但我猜问题可以追溯到您提供 fit 的 _data 变量。您没有显示创建 _data 的代码，而只给出了模糊的描述：

_data is a list of feature vectors( ~feat_vec~ )

从您的打印件来看，颜色和直方图似乎有大约 80 个值。但位置显然还有更多。这与您的说法一致

also when i don't use location feature and train only with color and histogram features, then it doesn't generate the error, and both the training and prediction program works fine. but only when all the three features stacked it geves the error.

您可以 hstack 他们的事实并没有告诉我们他们将如何在 np.array(....) 中工作。

In [35]: np.hstack([[1,2,3],[4,5,6],[7,8,9,10]])
Out[35]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

这是我以前回答过关于同一个 ValueError 的问题的列表：

https://whosebug.com/search?q=user%3A901925+ValueError%2Bsequence

ValueError: setting an array element with a sequence()

ValueError: setting an array element with a sequence()

python

arrays

numpy

random-forest

scikit-image