为什么cgo的性能这么慢？我的测试代码有问题吗？

Question

我在做一个测试：比较 cgo 和纯 Go 函数的执行时间运行每个 1 亿次。与 Golang 函数相比，cgo 函数花费的时间更长，我对这个结果感到困惑。我的测试代码是：

package main

import (
    "fmt"
    "time"
)

/*
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void show() {

}

*/
// #cgo LDFLAGS: -lstdc++
import "C"

//import "fmt"

func show() {

}

func main() {
    now := time.Now()
    for i := 0; i < 100000000; i = i + 1 {
        C.show()
    }
    end_time := time.Now()

    var dur_time time.Duration = end_time.Sub(now)
    var elapsed_min float64 = dur_time.Minutes()
    var elapsed_sec float64 = dur_time.Seconds()
    var elapsed_nano int64 = dur_time.Nanoseconds()
    fmt.Printf("cgo show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
        elapsed_min, elapsed_sec, elapsed_nano)

    now = time.Now()
    for i := 0; i < 100000000; i = i + 1 {
        show()
    }
    end_time = time.Now()

    dur_time = end_time.Sub(now)
    elapsed_min = dur_time.Minutes()
    elapsed_sec = dur_time.Seconds()
    elapsed_nano = dur_time.Nanoseconds()
    fmt.Printf("go show function elasped %f minutes or \nelapsed %f seconds or \nelapsed %d nanoseconds\n",
        elapsed_min, elapsed_sec, elapsed_nano)

    var input string
    fmt.Scanln(&input)
}

结果是：

cgo show function elasped 0.368096 minutes or 
elapsed 22.085756 seconds or 
elapsed 22085755775 nanoseconds

go show function elasped 0.000654 minutes or 
elapsed 0.039257 seconds or 
elapsed 39257120 nanoseconds

结果显示调用C函数比Go函数慢。我的测试代码有问题吗？

我的系统是：mac OS X 10.9.4 (13E28)

Answer 1

从 Go 调用 C 函数有一点开销。这无法更改。

Answer 2

如您所见，通过 CGo 调用 C/C++ 代码的开销相当高。所以一般来说，你最好尽量减少 CGo 调用的次数。对于上面的示例，与其在循环中重复调用 CGo 函数，不如将循环向下移动到 C。

Go 运行time 设置其线程的方式有很多方面可以打破许多 C 代码的预期：

Goroutines 运行在相对较小的堆栈上，通过分段堆栈（旧版本）或复制（新版本）处理堆栈增长。
Go 运行time 创建的线程可能无法与 libpthread 的线程本地存储实现正确交互。
Go 运行time 的 UNIX 信号处理程序可能会干扰传统的 C 或 C++ 代码。
Go 将 OS 线程重用到运行多个 Goroutine。如果 C 代码调用阻塞系统调用或以其他方式独占线程，则可能对其他 goroutine 不利。

出于这些原因，CGo 选择了运行在使用传统堆栈设置的单独线程中运行 C 代码的安全方法。

如果您来自 Python 这样的语言，在这些语言中用 C 重写代码热点作为加速程序的方式并不少见，您会感到失望。但同时，等效的 C 和 Go 代码在性能上的差距要小得多。

一般来说，我保留 CGo 用于与现有库的接口，可能使用小型 C 包装函数，可以减少我需要从 Go 进行的调用次数。

Answer 3

James 的更新：目前的实现似乎没有线程切换。

参见 this thread 关于 golang-nuts 的内容：

There's always going to be some overhead. It's more expensive than a simple function call but significantly less expensive than a context switch (agl is remembering an earlier implementation; we cut out the thread switch before the public release). Right now the expense is basically just having to do a full register set switch (no kernel involvement). I'd guess it's comparable to ten function calls.

另请参阅 which links "cgo is not Go" 博客 post。

C doesn’t know anything about Go’s calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.

因此，cgo 有开销，因为它执行堆栈切换，而不是线程切换。

调用C函数时保存和恢复所有寄存器，调用Go函数或汇编函数时不需要

除此之外，cgo 的调用约定禁止将 Go 指针直接传递给 C 代码，常见的解决方法是使用 C.malloc，因此引入了额外的分配。有关详细信息，请参阅。

为什么cgo的性能这么慢？我的测试代码有问题吗？

Why cgo's performance is so slow? is there something wrong with my testing code?

c

performance

go

cgo