Python 中的字符串连接比 Go 快得多

Question

我正在考虑使用 Go 编写一个主要处理文本的小程序。我很确定，根据我对 Go 的了解和 Python Go 会快得多。我实际上并没有特别需要疯狂的速度，但我想了解一下 Go。

"Go is going to be faster" 这个想法得到了一个简单测试的支持：

# test.py
print("Hello world")

$ time python dummy.py
Hello world

real    0m0.029s
user    0m0.019s
sys 0m0.010s

// test.go
package main

import "fmt"

func main() {

    fmt.Println("hello world")
}

$ time ./test
hello world

real    0m0.001s
user    0m0.001s
sys 0m0.000s

就原始启动速度而言看起来不错（完全符合预期）。高度不科学的理由：

$ strace python test.py 2>&1 | wc -l
1223
$ strace ./test 2>&1 | wc -l
174

然而，我的下一个人为测试是 Go 在使用字符串时有多快，我期待着同样被 Go 的原始速度所震撼。所以，这令人惊讶：

# test2.py
s = ""

for i in range(1000000):
    s += "a"

$ time python test2.py
real    0m0.179s
user    0m0.145s
sys 0m0.013s

// test2.go
package main

func main() {

    s := ""

    for i:= 0; i < 1000000; i++ {
        s += "a";
    }
}

$ time ./test2
real    0m56.840s
user    1m50.836s
sys 0m17.653

所以 Go 比 Python 慢数百倍。

现在，我知道这可能是由于 Schlemiel the Painter's algorithm，这解释了为什么 Go 实现在 i 中是二次的（i 大 10 倍导致减速 100 倍） .

然而，Python 实现似乎快得多：10 倍以上的循环只会减慢两倍。如果您连接 str(i)，同样的效果仍然存在，所以我怀疑对 s = 100000 * 'a' 进行了某种神奇的 JIT 优化。如果我最后 print(s) 也不会慢很多，所以变量没有被优化。

抛开连接方法的天真（每种语言中肯定有更多惯用的方法），这里有什么我误解了，还是 Go 比 Python 到运行在处理字符串时必须处理 C/C++ 风格的算法问题的情况下（在这种情况下，直接的 Go 端口可能不像 uh-may-就像我希望的那样，你知道，不必思考事情并做功课）？

或者我是否运行遇到 Python 恰好工作良好，但在更复杂的使用下崩溃的情况？

使用的版本： Python 3.8.2，Go 1.14.2

Answer 1

嗯。你永远不应该以这种方式使用字符串连接:-)

开始，尝试 strings.Buider

package main

import (
 "strings"
)

func main() {

    var b1 strings.Builder

    for i:= 0; i < 1000000; i++ {
        b1.WriteString("a");
    }
}

Answer 2

TL;DR 总结：基本上，您正在测试两个实现的分配器/垃圾收集器，并在 Python 侧对比例进行大量加权（可以说是偶然的，但这是 Python 人们在某些时候进行了优化）。

将我的评论扩展为真实答案：

Go 和 Python 都对字符串进行了计数，即字符串被实现为 two-element header 包含长度（字节数或 Python 3个字符串，Unicode字符数）和数据指针。
Go 和 Python 都是 garbage-collected (GCed) 语言。也就是说，在这两种语言中，您都可以分配内存而不必担心自己释放它：系统会自动处理。
但是底层实现不同，在这个特定的一个重要方面有很大不同：您使用的 Python 版本有一个 引用计数 GC。您使用的 Go 系统没有。

有了引用计数，Python 字符串处理程序的内部位就可以做到这一点。我将其表示为 Go（或至少 pseudo-Go），尽管实际的 Python 实现是在 C 中并且我没有正确排列所有细节：

// add (append) new string t to existing string s
func add_to_string(s, t string_header) string_header {
    need = s.len + t.len
    if s.refcount == 1 { // can modify string in-place
        data = s.data
        if cap(data) >= need {
            copy_into(data + s.len, t.data, t.len)
            return s
        }
    }
    // s is shared or s.cap < need

    new_s := make_new_string(roundup(need))
    // important: new_s has extra space for the next call to add_to_string

    copy_into(new_s.data, s.data, s.len)
    copy_into(new_s.data + s.len, t.data, t.len)
    s.refcount--
    if s.refcount == 0 {
        gc_release_string(s)
    }
    return new_s
}

通过 over-allocating——将 need 值四舍五入，使 cap(new_s) 变大——我们得到大约 log₂(n) 次调用到分配器，其中 n 是您执行的次数 s += "a"。 n 为 1000000（一百万），这大约是我们实际必须调用 make_new_string 函数并释放（出于 gc 目的，因为收集器使用 refcounts 作为第一遍）旧字符串 s 的大约 20 次.

[编辑：您的来源考古导致 commit 2c9c7a5f33d, which suggests less than doubling but still a multiplicative increase. To other readers, see comment。]

当前的 Go 实现分配的字符串没有单独的容量 header 字段（参见 reflect.StringHeader 并注意 "don't depend on this, it might be different in future implementations" 的重要警告）。在缺少引用计数之间——我们无法在添加两个字符串的运行时例程中判断目标只有一个引用——以及无法观察到 cap(s)（或 cap(s.data)）的等价物，Go 运行时必须每次都创建一个新字符串。那是一百万个内存分配。

为了证明 Python 代码确实使用了引用计数，请使用您原来的 Python:

s = ""

for i in range(1000000):
    s += "a"

并添加第二个变量 t，如下所示：

s = ""
t = s

for i in range(1000000):
    s += "a"
    t = s

执行时间的差异令人印象深刻：

$ time python test2.py
        0.68 real         0.65 user         0.03 sys
$ time python test3.py
       34.60 real        34.08 user         0.51 sys

修改后的 Python 程序在同一系统上仍然优于 Go (1.13.5)：

$ time ./test2
       67.32 real       103.27 user        13.60 sys

我没有进一步深入细节，但我怀疑 Go GC 运行比 Python 更积极。 Go GC 在内部非常不同，需要写屏障和偶尔的 "stop the world" 行为（所有不执行 GC 工作的 goroutines）。 Python GC 的引用计数性质允许它永远不会停止：即使引用计数为 2，t 上的引用计数也会下降到 1，然后下一次分配给 t 时会将其下降到零，在主循环的下一次旅行中释放 re-use 的内存块。所以它可能一遍又一遍地拾取相同的内存块。

（如果我没记错的话，Python的"over-allocate strings and check the refcount to allow expand-in-place"技巧并不是在[=80=的所有版本中都有]。它可能是在Python 2.4左右首先添加的左右。这段记忆极其模糊，快速 Google 搜索没有找到任何证据。[编辑：Python 2.7.4，显然。])

Python 中的字符串连接比 Go 快得多

String concatenation much faster in Python than Go

python

performance

string-concatenation

go