同一块中的线程是否按顺序执行?
Do threads in the same block execute sequentially?
我有以下简单的代码:
#include <stdio.h>
__global__ void loop()
{
int i=threadIdx.x + blockIdx.x * blockDim.x;
printf("This is iteration number %d\n", i);
}
int main()
{
int N = 10;
loop<<<10,10>>>();
cudaDeviceSynchronize();
}
当运行它时,我得到了以下信息:
This is iteration number 20
This is iteration number 21
This is iteration number 22
This is iteration number 23
This is iteration number 24
This is iteration number 25
This is iteration number 26
This is iteration number 27
This is iteration number 28
This is iteration number 29
This is iteration number 70
This is iteration number 71
This is iteration number 72
This is iteration number 73
This is iteration number 74
This is iteration number 75
This is iteration number 76
This is iteration number 77
This is iteration number 78
This is iteration number 79
This is iteration number 0
This is iteration number 1
This is iteration number 2
This is iteration number 3
This is iteration number 4
This is iteration number 5
This is iteration number 6
This is iteration number 7
This is iteration number 8
This is iteration number 9
This is iteration number 50
This is iteration number 51
This is iteration number 52
This is iteration number 53
This is iteration number 54
This is iteration number 55
This is iteration number 56
This is iteration number 57
This is iteration number 58
This is iteration number 59
This is iteration number 10
This is iteration number 11
This is iteration number 12
This is iteration number 13
This is iteration number 14
This is iteration number 15
This is iteration number 16
This is iteration number 17
This is iteration number 18
This is iteration number 19
This is iteration number 60
This is iteration number 61
This is iteration number 62
This is iteration number 63
This is iteration number 64
This is iteration number 65
This is iteration number 66
This is iteration number 67
This is iteration number 68
This is iteration number 69
This is iteration number 30
This is iteration number 31
This is iteration number 32
This is iteration number 33
This is iteration number 34
This is iteration number 35
This is iteration number 36
This is iteration number 37
This is iteration number 38
This is iteration number 39
This is iteration number 80
This is iteration number 81
This is iteration number 82
This is iteration number 83
This is iteration number 84
This is iteration number 85
This is iteration number 86
This is iteration number 87
This is iteration number 88
This is iteration number 89
This is iteration number 40
This is iteration number 41
This is iteration number 42
This is iteration number 43
This is iteration number 44
This is iteration number 45
This is iteration number 46
This is iteration number 47
This is iteration number 48
This is iteration number 49
This is iteration number 90
This is iteration number 91
This is iteration number 92
This is iteration number 93
This is iteration number 94
This is iteration number 95
This is iteration number 96
This is iteration number 97
This is iteration number 98
This is iteration number 99
如你所见,它看到同一个块中的线程正在顺序执行(按顺序),例如:数字 0 到 9 可以出现在它们之间的任何地方,0 总是先出现然后 1 然后 2,等等...
这是巧合吗(我尝试 运行 多次并得到相同的结果)还是同一块中的线程总是按顺序执行?
CUDA 线程可以按任何顺序执行(除非您明确控制顺序)。这是所有 CUDA 程序员都应该具备的心智模型,并且试图将编程行为基于其他观察或原则可能是有风险的。
在设备中,目前有warp 的概念,它表示线程以锁步方式执行。目前 warp 大小为 32,这意味着每个块中的前 10 个线程都属于同一个 warp,因此以锁步方式执行。因此,在您的案例中,同一块内的 printf
调用恰好属于同一个 warp。
因此,当 warp 中的任何线程调用 printf
例程时,它们都是。这种同步 activity 必须以某种方式在您的输出中表现出来,您只是观察到每种情况下的表现模式都是相同的。
这并不意味着执行的序列化通常在进行,但可能 printf
如何处理 warp 中的并发线程。考虑到所有 printf
暂时输出 goes into the same buffer,从一个 warp 中的 10 个线程到单个缓冲区的这种输出漏斗可能遵循某种模式似乎是合乎逻辑的,而“随机”似乎并不像该模式的合乎逻辑或可能的选择,至少我们可以说行为似乎是一致的,这不足为奇。
由于设备 printf
实现的细节大部分没有记录,我认为使用 printf
来发现设备行为细节可能不是很可靠。
我有以下简单的代码:
#include <stdio.h>
__global__ void loop()
{
int i=threadIdx.x + blockIdx.x * blockDim.x;
printf("This is iteration number %d\n", i);
}
int main()
{
int N = 10;
loop<<<10,10>>>();
cudaDeviceSynchronize();
}
当运行它时,我得到了以下信息:
This is iteration number 20
This is iteration number 21
This is iteration number 22
This is iteration number 23
This is iteration number 24
This is iteration number 25
This is iteration number 26
This is iteration number 27
This is iteration number 28
This is iteration number 29
This is iteration number 70
This is iteration number 71
This is iteration number 72
This is iteration number 73
This is iteration number 74
This is iteration number 75
This is iteration number 76
This is iteration number 77
This is iteration number 78
This is iteration number 79
This is iteration number 0
This is iteration number 1
This is iteration number 2
This is iteration number 3
This is iteration number 4
This is iteration number 5
This is iteration number 6
This is iteration number 7
This is iteration number 8
This is iteration number 9
This is iteration number 50
This is iteration number 51
This is iteration number 52
This is iteration number 53
This is iteration number 54
This is iteration number 55
This is iteration number 56
This is iteration number 57
This is iteration number 58
This is iteration number 59
This is iteration number 10
This is iteration number 11
This is iteration number 12
This is iteration number 13
This is iteration number 14
This is iteration number 15
This is iteration number 16
This is iteration number 17
This is iteration number 18
This is iteration number 19
This is iteration number 60
This is iteration number 61
This is iteration number 62
This is iteration number 63
This is iteration number 64
This is iteration number 65
This is iteration number 66
This is iteration number 67
This is iteration number 68
This is iteration number 69
This is iteration number 30
This is iteration number 31
This is iteration number 32
This is iteration number 33
This is iteration number 34
This is iteration number 35
This is iteration number 36
This is iteration number 37
This is iteration number 38
This is iteration number 39
This is iteration number 80
This is iteration number 81
This is iteration number 82
This is iteration number 83
This is iteration number 84
This is iteration number 85
This is iteration number 86
This is iteration number 87
This is iteration number 88
This is iteration number 89
This is iteration number 40
This is iteration number 41
This is iteration number 42
This is iteration number 43
This is iteration number 44
This is iteration number 45
This is iteration number 46
This is iteration number 47
This is iteration number 48
This is iteration number 49
This is iteration number 90
This is iteration number 91
This is iteration number 92
This is iteration number 93
This is iteration number 94
This is iteration number 95
This is iteration number 96
This is iteration number 97
This is iteration number 98
This is iteration number 99
如你所见,它看到同一个块中的线程正在顺序执行(按顺序),例如:数字 0 到 9 可以出现在它们之间的任何地方,0 总是先出现然后 1 然后 2,等等... 这是巧合吗(我尝试 运行 多次并得到相同的结果)还是同一块中的线程总是按顺序执行?
CUDA 线程可以按任何顺序执行(除非您明确控制顺序)。这是所有 CUDA 程序员都应该具备的心智模型,并且试图将编程行为基于其他观察或原则可能是有风险的。
在设备中,目前有warp 的概念,它表示线程以锁步方式执行。目前 warp 大小为 32,这意味着每个块中的前 10 个线程都属于同一个 warp,因此以锁步方式执行。因此,在您的案例中,同一块内的 printf
调用恰好属于同一个 warp。
因此,当 warp 中的任何线程调用 printf
例程时,它们都是。这种同步 activity 必须以某种方式在您的输出中表现出来,您只是观察到每种情况下的表现模式都是相同的。
这并不意味着执行的序列化通常在进行,但可能 printf
如何处理 warp 中的并发线程。考虑到所有 printf
暂时输出 goes into the same buffer,从一个 warp 中的 10 个线程到单个缓冲区的这种输出漏斗可能遵循某种模式似乎是合乎逻辑的,而“随机”似乎并不像该模式的合乎逻辑或可能的选择,至少我们可以说行为似乎是一致的,这不足为奇。
由于设备 printf
实现的细节大部分没有记录,我认为使用 printf
来发现设备行为细节可能不是很可靠。