在 Go 中并发生成随机数
Generating random numbers concurrently in Go
我是 Go 和 concurrent/parallel 编程的新手。为了尝试(并希望看到)goroutines 的性能优势,我整理了一个小测试程序,它只生成 1 亿个随机 int
s - 首先在一个 goroutine 中,然后在尽可能多的 goroutines 中据 runtime.NumCPU()
.
报道
但是,与使用单个 goroutine 相比,使用更多 goroutine 的性能始终较差。我假设我在我的程序设计或我使用 goroutines/channels/other Go 特性的方式中遗漏了一些重要的东西。非常感谢任何反馈。
我附上下面的代码。
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan int, numIntsToGenerate)
// Slices to keep resulting ints
singleThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice = append(singleThreadIntSlice,(<-ch))
}
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice = append(multiThreadIntSlice,(<-ch))
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
}
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
ch <- generator.Intn(numInts*100)
}
}
首先让我们更正和优化您代码中的一些内容:
从 Go 1.5 开始,GOMAXPROCS
默认为 CPU 个可用内核数,因此无需设置(尽管它没有坏处)。
要生成的号码:
var numIntsToGenerate = 100000000
var numIntsPerThread = numIntsToGenerate / numThreads
如果 numThreads
像 3,在多 goroutine 的情况下,生成的数字会更少(由于整数除法),所以让我们更正它:
numIntsToGenerate = numIntsPerThread * numThreads
不需要为 1 亿个值设置缓冲区,将其减少到一个合理的值(例如 1000):
ch := make(chan int, 1000)
如果你想使用append()
,你创建的切片应该有0长度(和适当的容量):
singleThreadIntSlice := make([]int, 0, numIntsToGenerate)
multiThreadIntSlice := make([]int, 0, numIntsToGenerate)
但在你的情况下这是不必要的,因为只有 1 个 goroutine 正在收集结果,你可以简单地使用索引,并像这样创建切片:
singleThreadIntSlice := make([]int, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate)
收集结果时:
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice[i] = <-ch
}
// ...
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice[i] = <-ch
}
好的。代码现在更好了。尝试 运行 它时,您仍然会体验到 multi-goroutine 版本 运行 的速度较慢。这是为什么?
因为控制、同步和收集来自多个goroutines的结果确实有开销。如果他们执行的任务很少,通信开销会更大,整体性能会下降。
你的情况就是这样。设置 rand.Rand()
后生成单个随机数非常快。
让我们将您的 "task" 修改得足够大,以便我们可以看到多个 goroutine 的好处:
// 1 million is enough now:
var numIntsToGenerate = 1000 * 1000
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
// Kill time, do some processing:
for j := 0; j < 1000; j++ {
generator.Intn(numInts * 100)
}
// and now return a single random number
ch <- generator.Intn(numInts * 100)
}
}
在这种情况下,为了获得一个随机数,我们生成了 1000 个随机数,然后在生成我们 return 之前将它们丢弃(进行一些计算/消磨时间)。我们这样做是为了使worker goroutines的计算时间超过多个goroutines的通信开销。
运行 现在的应用程序,我在 4 核机器上的结果:
Initiating single-threaded random number generation.
Single-threaded run took 2.440604504s
Initiating multi-threaded random number generation.
Multi-threaded run took 987.946758ms
multi-goroutine版本运行s2.5倍快。这意味着如果您的 goroutines 以 1000 个块的形式传递随机数,您将看到执行速度提高 2.5 倍(与单个 goroutine 生成相比)。
最后一点:
您的 single-goroutine 版本还使用了多个 goroutine:1 个用于生成数字,1 个用于收集结果。收集器很可能没有充分利用 CPU 核心,大部分时间只是等待结果,但仍然:使用了 2 CPU 核心。我们估计使用了“1.5”CPU 个内核。 multi-goroutine 版本使用 4 个 CPU 核心。粗略估计一下:4 / 1.5 = 2.66,非常接近我们的性能增益。
如果你真的想并行生成随机数,那么每个任务应该是生成数字,然后 return 一次性生成它们,而不是一次生成一个数字并提供它们到一个通道,因为在 multi go 例程中读取和写入通道会减慢速度。下面是修改后的代码,其中任务一次性生成所需的数字,这在 multi go routines 情况下表现更好,我也使用 slice of slices 来收集 multi go routines 的结果。
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan []int)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
singleThreadIntSlice := <-ch
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
multiThreadIntSlice := make([][]int, numThreads)
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numThreads; i++ {
multiThreadIntSlice[i] = <-ch
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
//To avoid not used warning
fmt.Print(len(singleThreadIntSlice))
}
func makeRandomNumbers(numInts int, ch chan []int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
result := make([]int, numInts)
for i := 0; i < numInts; i++ {
result[i] = generator.Intn(numInts * 100)
}
ch <- result
}
我是 Go 和 concurrent/parallel 编程的新手。为了尝试(并希望看到)goroutines 的性能优势,我整理了一个小测试程序,它只生成 1 亿个随机 int
s - 首先在一个 goroutine 中,然后在尽可能多的 goroutines 中据 runtime.NumCPU()
.
但是,与使用单个 goroutine 相比,使用更多 goroutine 的性能始终较差。我假设我在我的程序设计或我使用 goroutines/channels/other Go 特性的方式中遗漏了一些重要的东西。非常感谢任何反馈。
我附上下面的代码。
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan int, numIntsToGenerate)
// Slices to keep resulting ints
singleThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate, numIntsToGenerate)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice = append(singleThreadIntSlice,(<-ch))
}
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice = append(multiThreadIntSlice,(<-ch))
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
}
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
ch <- generator.Intn(numInts*100)
}
}
首先让我们更正和优化您代码中的一些内容:
从 Go 1.5 开始,GOMAXPROCS
默认为 CPU 个可用内核数,因此无需设置(尽管它没有坏处)。
要生成的号码:
var numIntsToGenerate = 100000000
var numIntsPerThread = numIntsToGenerate / numThreads
如果 numThreads
像 3,在多 goroutine 的情况下,生成的数字会更少(由于整数除法),所以让我们更正它:
numIntsToGenerate = numIntsPerThread * numThreads
不需要为 1 亿个值设置缓冲区,将其减少到一个合理的值(例如 1000):
ch := make(chan int, 1000)
如果你想使用append()
,你创建的切片应该有0长度(和适当的容量):
singleThreadIntSlice := make([]int, 0, numIntsToGenerate)
multiThreadIntSlice := make([]int, 0, numIntsToGenerate)
但在你的情况下这是不必要的,因为只有 1 个 goroutine 正在收集结果,你可以简单地使用索引,并像这样创建切片:
singleThreadIntSlice := make([]int, numIntsToGenerate)
multiThreadIntSlice := make([]int, numIntsToGenerate)
收集结果时:
for i := 0; i < numIntsToGenerate; i++ {
singleThreadIntSlice[i] = <-ch
}
// ...
for i := 0; i < numIntsToGenerate; i++ {
multiThreadIntSlice[i] = <-ch
}
好的。代码现在更好了。尝试 运行 它时,您仍然会体验到 multi-goroutine 版本 运行 的速度较慢。这是为什么?
因为控制、同步和收集来自多个goroutines的结果确实有开销。如果他们执行的任务很少,通信开销会更大,整体性能会下降。
你的情况就是这样。设置 rand.Rand()
后生成单个随机数非常快。
让我们将您的 "task" 修改得足够大,以便我们可以看到多个 goroutine 的好处:
// 1 million is enough now:
var numIntsToGenerate = 1000 * 1000
func makeRandomNumbers(numInts int, ch chan int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
for i := 0; i < numInts; i++ {
// Kill time, do some processing:
for j := 0; j < 1000; j++ {
generator.Intn(numInts * 100)
}
// and now return a single random number
ch <- generator.Intn(numInts * 100)
}
}
在这种情况下,为了获得一个随机数,我们生成了 1000 个随机数,然后在生成我们 return 之前将它们丢弃(进行一些计算/消磨时间)。我们这样做是为了使worker goroutines的计算时间超过多个goroutines的通信开销。
运行 现在的应用程序,我在 4 核机器上的结果:
Initiating single-threaded random number generation.
Single-threaded run took 2.440604504s
Initiating multi-threaded random number generation.
Multi-threaded run took 987.946758ms
multi-goroutine版本运行s2.5倍快。这意味着如果您的 goroutines 以 1000 个块的形式传递随机数,您将看到执行速度提高 2.5 倍(与单个 goroutine 生成相比)。
最后一点:
您的 single-goroutine 版本还使用了多个 goroutine:1 个用于生成数字,1 个用于收集结果。收集器很可能没有充分利用 CPU 核心,大部分时间只是等待结果,但仍然:使用了 2 CPU 核心。我们估计使用了“1.5”CPU 个内核。 multi-goroutine 版本使用 4 个 CPU 核心。粗略估计一下:4 / 1.5 = 2.66,非常接近我们的性能增益。
如果你真的想并行生成随机数,那么每个任务应该是生成数字,然后 return 一次性生成它们,而不是一次生成一个数字并提供它们到一个通道,因为在 multi go 例程中读取和写入通道会减慢速度。下面是修改后的代码,其中任务一次性生成所需的数字,这在 multi go routines 情况下表现更好,我也使用 slice of slices 来收集 multi go routines 的结果。
package main
import "fmt"
import "time"
import "math/rand"
import "runtime"
func main() {
// Figure out how many CPUs are available and tell Go to use all of them
numThreads := runtime.NumCPU()
runtime.GOMAXPROCS(numThreads)
// Number of random ints to generate
var numIntsToGenerate = 100000000
// Number of ints to be generated by each spawned goroutine thread
var numIntsPerThread = numIntsToGenerate / numThreads
// Channel for communicating from goroutines back to main function
ch := make(chan []int)
fmt.Printf("Initiating single-threaded random number generation.\n")
startSingleRun := time.Now()
// Generate all of the ints from a single goroutine, retrieve the expected
// number of ints from the channel and put in target slice
go makeRandomNumbers(numIntsToGenerate, ch)
singleThreadIntSlice := <-ch
elapsedSingleRun := time.Since(startSingleRun)
fmt.Printf("Single-threaded run took %s\n", elapsedSingleRun)
fmt.Printf("Initiating multi-threaded random number generation.\n")
multiThreadIntSlice := make([][]int, numThreads)
startMultiRun := time.Now()
// Run the designated number of goroutines, each of which generates its
// expected share of the total random ints, retrieve the expected number
// of ints from the channel and put in target slice
for i := 0; i < numThreads; i++ {
go makeRandomNumbers(numIntsPerThread, ch)
}
for i := 0; i < numThreads; i++ {
multiThreadIntSlice[i] = <-ch
}
elapsedMultiRun := time.Since(startMultiRun)
fmt.Printf("Multi-threaded run took %s\n", elapsedMultiRun)
//To avoid not used warning
fmt.Print(len(singleThreadIntSlice))
}
func makeRandomNumbers(numInts int, ch chan []int) {
source := rand.NewSource(time.Now().UnixNano())
generator := rand.New(source)
result := make([]int, numInts)
for i := 0; i < numInts; i++ {
result[i] = generator.Intn(numInts * 100)
}
ch <- result
}