OpenCL。矩阵乘法绕过一些工作项

Question

尝试在 OpenCL 中实现矩阵乘法时，我尝试编写自己的方法；但是似乎某些工作项的工作似乎被其他工作项覆盖了，我真的不知道该如何处理。

我真正确定的是问题出在 OpenCL 程序中。

我的主机代码在 C/C++。

程序构建并返回输出（错误，但程序成功退出）。

这是我的方法：

__kernel void matrixMultiplication(
         __global double* matrix1,
         __global double* matrix2,
         __global double* output,
         const unsigned int ROWS_M1, // ROWS_M1 = 3
         const unsigned int ROWS_M1, // COLS_M1 = 2
         const unsigned int ROWS_M2, // ROWS_M2 = 2
         const unsigned int ROWS_M2, // COLS_M2 = 4
         const unsigned int ROWS_M3, // ROWS_M3 = 3
         const unsigned int ROWS_M3) { // COLS_M3 = 4

    int i = get_global_id(0);
    int j = get_global_id(1);

    // for each value in the matrix1 (for each work-item)
    // and for each value in the "jth" row in the second matrix...
    // multiply the values and then add them according to the right offset.

    for(int k =0; k < COLS_M2; k++){
        int offsetM1 = (i*COLS_M1)+j;
        int offsetM2 = (j*COLS_M2)+k;
        int offsetM3 = (i*COLS_M3)+k;

        //output[i][k] += matrix1[i][j]*matrix2[j][k];
        output[offsetM3] += matrix1[offsetM1]*matrix2[offsetM2];
    }

}

为每个 "const unsigned int" 设置的值在代码中指定。

矩阵的值为：

矩阵 1:

1 2
3 4
5 6

矩阵2：

2 3 4 5
6 7 8 9

给定输出：

12 14 16 18
24 28 32 36
36 42 48 54

期望的输出：

14 17 20 23
30 37 44 51
46 57 68 79

Answer 1

我认为您在索引方面做错了。 *offsetM3* 应等于 *i\*COLS_M3+j* ，*offsetM1* 应等于 *i\*COLS_M1+k* ，*offsetM2* 应等于 *k\*COLS_M2+j*。

将矩阵写在纸上并进行数学运算，然后将矩阵写在数组中，就像在内存中一样，然后将它们相乘，然后您将看到索引模式。请记住，每个线程（工作项）都用于新矩阵的一个元素。如果您通过 for 循环更改新矩阵的索引，则您没有遵循一个矩阵元素一个工作项的逻辑，如果您希望那样的话，您应该考虑另一种逻辑。希望这有帮助

Answer 2

TL;博士

问题是我的循环。不要那样做不好

现在我已经完成了我的大学成绩和所有事情，我可以花一些时间来写一个正确的答案来回答我自己的问题，这样其他遇到同样问题的人就有希望找到这个。

按照我编写循环的方式，有一种情况是各种工作项会与其他工作项重叠，从而在不同的执行测试之间产生不同的结果；基本上是一个互斥问题，您可以使用信号量轻松解决。

解决方案是在计算特定偏移量时使用不同的方法重写整个循环。

这是解决我的问题的来源，可能会觉得这很有趣或有用

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void multiplyMatrix(                                  
   __global double* matrix1,                                   
   __global double* matrix2,                                   
   __global double* output,                                    
   const unsigned int ROWS_M1,                                 
   const unsigned int COLS_M1,                                          
   const unsigned int ROWS_M2,                                          
   const unsigned int COLS_M2,                                          
   const unsigned int ROWS_M3,                                          
   const unsigned int COLS_M3) {                                        
                                                                        
   int i = get_global_id(0);                                            
   int j = get_global_id(1);                                            
   double aux = 0.0;                                                    
   int offsetM1;                                                        
   int offsetM2;                                                        
   int offsetM3;                                                        
    // foreach value in the matrix1 (each process in the workgroup) 
    // and foreach row in the second matrix multiply the values 
    // adding to the according calculating offest/position      
    for(int k=0; k < COLS_M2; k++){                                 
                                                                    
        offsetM1 = (i*COLS_M1)+j;                                
        offsetM2 = (j*COLS_M2)+k;                                
        offsetM3 = (i*COLS_M3)+k;                                
                                                                   
        //output[i][k] += matrix1[i][j]*matrix2[j][k]              
        aux = 0.0;                                                 
        aux = (matrix1[offsetM1]*matrix2[offsetM2])  +aux;   
                                                                      
    }                                                            
    output[offsetM3] =aux;                                                                
}

OpenCL。矩阵乘法绕过一些工作项

OpenCL. Matrix multiplication Bypasses some Work-Items

c++

opencl

visual-studio

matrix-multiplication

visual-c++

TL;博士