具有多个并发 tls 拨号程序的 golang 中的常驻内存增加
constant resident memory increase in golang with multiple concurrent tls dialers
好吧,这已经困扰我好几个星期了,我不知道我遗漏了什么,也不知道泄漏在哪里,甚至不知道它是否存在。我的工作量相当简单。获取 URLs 的列表,启动一个 goroutines 池,从通道中拉取 URLs 并使用 tls.Dialer 创建到它们的 tls 连接。下面是显示不断上升的内存图的快照和我的代码的 POC。
我的猜测是它与 tls 包完成的分配有关,因为它似乎只爬升它连接到的更“成功”的 URLs。 IE。如果他们中的大多数人没有连接,我看不到稳定的内存增加。
这是 运行 中途的 pprof 输出:
Showing nodes accounting for 190.70MB, 95.58% of 199.53MB total
Dropped 34 nodes (cum <= 1MB)
Showing top 20 nodes out of 77
flat flat% sum% cum cum%
51.52MB 25.82% 25.82% 51.52MB 25.82% runtime.malg
24.10MB 12.08% 37.90% 24.10MB 12.08% bytes.makeSlice
17.07MB 8.55% 46.45% 41.17MB 20.63% crypto/tls.(*Conn).readHandshake
15MB 7.52% 53.97% 78.85MB 39.52% crypto/tls.dial
11MB 5.51% 59.48% 11.50MB 5.76% net.(*netFD).connect
10MB 5.01% 64.50% 15.42MB 7.73% context.WithDeadline
9MB 4.51% 69.01% 9MB 4.51% net.newFD (inline)
8MB 4.01% 73.02% 10.84MB 5.43% time.AfterFunc
7MB 3.51% 76.53% 52.93MB 26.53% net.(*Dialer).DialContext
5.50MB 2.76% 79.28% 5.50MB 2.76% context.(*cancelCtx).Done
5MB 2.51% 81.79% 84.35MB 42.28% main.main.func3
5MB 2.51% 84.30% 5MB 2.51% net.(*netFD).connect.func2
4.50MB 2.26% 86.55% 4.50MB 2.26% time.goFunc
4MB 2.01% 88.56% 4MB 2.01% crypto/tls.Client (inline)
3.16MB 1.58% 90.14% 3.16MB 1.58% main.main
2.84MB 1.42% 91.56% 2.84MB 1.42% time.startTimer
2.50MB 1.25% 92.82% 2.50MB 1.25% crypto/aes.(*aesCipherGCM).NewGCM
2.50MB 1.25% 94.07% 2.50MB 1.25% net.(*Resolver).internetAddrList.func1
1.50MB 0.75% 94.82% 1.50MB 0.75% crypto/tls.(*Config).Clone
1.50MB 0.75% 95.58% 1.50MB 0.75% crypto/aes.newCipher
package main
import (
"crypto/tls"
"net"
"sync"
"time"
)
func connectToTarget(targetString string, dialer *net.Dialer, config *tls.Config) {
tConn, err := tls.DialWithDialer(dialer,"tcp", targetString, config)
if err == nil {
//do something with connection
tConn.Close()
}
}
func main() {
workers := 256 * 256 //65536
tlsConfig := &tls.Config{
InsecureSkipVerify: true,
}
dialer := &net.Dialer{
FallbackDelay: -1,
KeepAlive: -1,
Timeout: time.Duration(60) * time.Second,
}
targetsChan := make(chan string, workers)
var workerDone sync.WaitGroup
workerDone.Add(workers)
for i := 0; i < workers; i++ {
go func(functionWg *sync.WaitGroup, dialer *net.Dialer, tlsConfig *tls.Config, targets chan string) {
for targetToConnect := range targets {
connectToTarget(targetToConnect, dialer, tlsConfig)
}
functionWg.Done()
}(&workerDone, dialer, tlsConfig,targetsChan)
}
targets := []string{} //in the actual code this reads from a file containing the list since it is large
for _,target := range targets {
targetsChan <- target
}
close(targetsChan)
workerDone.Wait()
}
更新:
这是第一个 pprof(用了 10 分钟)与我最后一个 pprof 的对比,后者是在它稳步攀升一段时间之后拍摄的。
Showing nodes accounting for 329.76MB, 83.32% of 395.77MB total
Dropped 57 nodes (cum <= 1.98MB)
flat flat% sum% cum cum%
199.43MB 50.39% 50.39% 199.43MB 50.39% bytes.makeSlice
80.80MB 20.42% 70.81% 280.22MB 70.80% crypto/tls.(*Conn).readHandshake
28.02MB 7.08% 77.89% 28.02MB 7.08% crypto/tls.Client (inline)
18.01MB 4.55% 82.44% 18.01MB 4.55% crypto/aes.(*aesCipherGCM).NewGCM
11MB 2.78% 85.22% 11MB 2.78% crypto/aes.newCipher
9.50MB 2.40% 87.62% 9.50MB 2.40% crypto/tls.(*Config).Clone
-8MB 2.02% 85.60% 15.53MB 3.92% crypto/tls.dial
-5.50MB 1.39% 84.21% -5.50MB 1.39% net.(*netFD).connect
-5MB 1.26% 82.94% -5.50MB 1.39% context.WithDeadline
-4.50MB 1.14% 81.81% -11.50MB 2.91% net.(*Dialer).DialContext
3.50MB 0.88% 82.69% 3.50MB 0.88% net.sockaddrToTCP
-3MB 0.76% 81.93% -3MB 0.76% time.AfterFunc
2MB 0.51% 82.44% 17.53MB 4.43% main.serverCert
1.50MB 0.38% 82.82% 2MB 0.51% crypto/tls.(*cipherSuiteTLS13).expandLabel
1MB 0.25% 83.07% 18MB 4.55% crypto/tls.aeadAESGCM
1MB 0.25% 83.32% 10MB 2.53% crypto/tls.aeadAESGCMTLS13
0.50MB 0.13% 83.45% 201.93MB 51.02% crypto/tls.(*Conn).readRecordOrCCS
-0.50MB 0.13% 83.32% -2MB 0.51% net.(*sysDialer).dialSingle
0 0% 83.32% 118.09MB 29.84% bytes.(*Buffer).Grow (inline)
0 0% 83.32% 81.33MB 20.55% bytes.(*Buffer).Write
0 0% 83.32% 199.43MB 50.39% bytes.(*Buffer).grow
0 0% 83.32% 11MB 2.78% crypto/aes.NewCipher
0 0% 83.32% 18.01MB 4.55% crypto/cipher.NewGCM (inline)
0 0% 83.32% 18.01MB 4.55% crypto/cipher.newGCMWithNonceAndTagSize
0 0% 83.32% 318.24MB 80.41% crypto/tls.(*Conn).Handshake
0 0% 83.32% 318.24MB 80.41% crypto/tls.(*Conn).clientHandshake
0 0% 83.32% 3.01MB 0.76% crypto/tls.(*Conn).readChangeCipherSpec (inline)
0 0% 83.32% 118.09MB 29.84% crypto/tls.(*Conn).readFromUntil
0 0% 83.32% 198.92MB 50.26% crypto/tls.(*Conn).readRecord (inline)
0 0% 83.32% 11.58MB 2.93% crypto/tls.(*Conn).retryReadRecord
0 0% 83.32% 154.61MB 39.06% crypto/tls.(*clientHandshakeState).doFullHandshake
0 0% 83.32% 22.51MB 5.69% crypto/tls.(*clientHandshakeState).establishKeys
0 0% 83.32% 180.12MB 45.51% crypto/tls.(*clientHandshakeState).handshake
0 0% 83.32% 3.01MB 0.76% crypto/tls.(*clientHandshakeState).readFinished
0 0% 83.32% 12MB 3.03% crypto/tls.(*clientHandshakeStateTLS13).establishHandshakeKeys
0 0% 83.32% 117.50MB 29.69% crypto/tls.(*clientHandshakeStateTLS13).handshake
0 0% 83.32% 92.92MB 23.48% crypto/tls.(*clientHandshakeStateTLS13).readServerCertificate
0 0% 83.32% 11.58MB 2.93% crypto/tls.(*clientHandshakeStateTLS13).readServerParameters
0 0% 83.32% 10.50MB 2.65% crypto/tls.(*halfConn).setTrafficSecret
0 0% 83.32% 15.53MB 3.92% crypto/tls.DialWithDialer (inline)
0 0% 83.32% 3MB 0.76% crypto/tls.cipherAES
0 0% 83.32% 318.24MB 80.41% crypto/tls.dial.func2
0 0% 83.32% 17.53MB 4.43% main.main.func3
0 0% 83.32% -2MB 0.51% net.(*sysDialer).dialSerial
0 0% 83.32% -2MB 0.51% net.internetSocket
0 0% 83.32% -2MB 0.51% net.socket
这是相同数据的火焰图:
最大的违规者是 bytes.makeSlice,它在握手读取期间被调用。这 可能 意味着每次 goroutines 创建一个新的 tls.DialWithDialer
以连接到 URL 时,缓冲区都会被保留。这会让我感到惊讶,因为我希望 Close()
方法能够驱逐那些缓冲区。
原来 //do something with connection
中的代码比我想象的更重要。即使在 tls.Dial 级别,您也必须阅读“正文”。我现在明显错误的假设是,tls.Dial 只是建立了连接,并且由于 GET / HTTP 1.1
请求尚未发送,因此不需要从线路上读取数据。这导致所有那些充满服务器响应的缓冲区闲置。
_, _= ioutil.ReadAll(tConn)
在一行中修复了所有问题。我觉得自己更聪明了,同时也变得愚蠢了。作为旁注,在此级别,如果服务器响应缓慢,ReadAll()
可能会挂起很长时间。 tConn.SetReadDeadline(time.Now().Add(time.Second * timeout))
也解决了这个问题。
好吧,这已经困扰我好几个星期了,我不知道我遗漏了什么,也不知道泄漏在哪里,甚至不知道它是否存在。我的工作量相当简单。获取 URLs 的列表,启动一个 goroutines 池,从通道中拉取 URLs 并使用 tls.Dialer 创建到它们的 tls 连接。下面是显示不断上升的内存图的快照和我的代码的 POC。
我的猜测是它与 tls 包完成的分配有关,因为它似乎只爬升它连接到的更“成功”的 URLs。 IE。如果他们中的大多数人没有连接,我看不到稳定的内存增加。
这是 运行 中途的 pprof 输出:
Showing nodes accounting for 190.70MB, 95.58% of 199.53MB total
Dropped 34 nodes (cum <= 1MB)
Showing top 20 nodes out of 77
flat flat% sum% cum cum%
51.52MB 25.82% 25.82% 51.52MB 25.82% runtime.malg
24.10MB 12.08% 37.90% 24.10MB 12.08% bytes.makeSlice
17.07MB 8.55% 46.45% 41.17MB 20.63% crypto/tls.(*Conn).readHandshake
15MB 7.52% 53.97% 78.85MB 39.52% crypto/tls.dial
11MB 5.51% 59.48% 11.50MB 5.76% net.(*netFD).connect
10MB 5.01% 64.50% 15.42MB 7.73% context.WithDeadline
9MB 4.51% 69.01% 9MB 4.51% net.newFD (inline)
8MB 4.01% 73.02% 10.84MB 5.43% time.AfterFunc
7MB 3.51% 76.53% 52.93MB 26.53% net.(*Dialer).DialContext
5.50MB 2.76% 79.28% 5.50MB 2.76% context.(*cancelCtx).Done
5MB 2.51% 81.79% 84.35MB 42.28% main.main.func3
5MB 2.51% 84.30% 5MB 2.51% net.(*netFD).connect.func2
4.50MB 2.26% 86.55% 4.50MB 2.26% time.goFunc
4MB 2.01% 88.56% 4MB 2.01% crypto/tls.Client (inline)
3.16MB 1.58% 90.14% 3.16MB 1.58% main.main
2.84MB 1.42% 91.56% 2.84MB 1.42% time.startTimer
2.50MB 1.25% 92.82% 2.50MB 1.25% crypto/aes.(*aesCipherGCM).NewGCM
2.50MB 1.25% 94.07% 2.50MB 1.25% net.(*Resolver).internetAddrList.func1
1.50MB 0.75% 94.82% 1.50MB 0.75% crypto/tls.(*Config).Clone
1.50MB 0.75% 95.58% 1.50MB 0.75% crypto/aes.newCipher
package main
import (
"crypto/tls"
"net"
"sync"
"time"
)
func connectToTarget(targetString string, dialer *net.Dialer, config *tls.Config) {
tConn, err := tls.DialWithDialer(dialer,"tcp", targetString, config)
if err == nil {
//do something with connection
tConn.Close()
}
}
func main() {
workers := 256 * 256 //65536
tlsConfig := &tls.Config{
InsecureSkipVerify: true,
}
dialer := &net.Dialer{
FallbackDelay: -1,
KeepAlive: -1,
Timeout: time.Duration(60) * time.Second,
}
targetsChan := make(chan string, workers)
var workerDone sync.WaitGroup
workerDone.Add(workers)
for i := 0; i < workers; i++ {
go func(functionWg *sync.WaitGroup, dialer *net.Dialer, tlsConfig *tls.Config, targets chan string) {
for targetToConnect := range targets {
connectToTarget(targetToConnect, dialer, tlsConfig)
}
functionWg.Done()
}(&workerDone, dialer, tlsConfig,targetsChan)
}
targets := []string{} //in the actual code this reads from a file containing the list since it is large
for _,target := range targets {
targetsChan <- target
}
close(targetsChan)
workerDone.Wait()
}
更新:
这是第一个 pprof(用了 10 分钟)与我最后一个 pprof 的对比,后者是在它稳步攀升一段时间之后拍摄的。
Showing nodes accounting for 329.76MB, 83.32% of 395.77MB total
Dropped 57 nodes (cum <= 1.98MB)
flat flat% sum% cum cum%
199.43MB 50.39% 50.39% 199.43MB 50.39% bytes.makeSlice
80.80MB 20.42% 70.81% 280.22MB 70.80% crypto/tls.(*Conn).readHandshake
28.02MB 7.08% 77.89% 28.02MB 7.08% crypto/tls.Client (inline)
18.01MB 4.55% 82.44% 18.01MB 4.55% crypto/aes.(*aesCipherGCM).NewGCM
11MB 2.78% 85.22% 11MB 2.78% crypto/aes.newCipher
9.50MB 2.40% 87.62% 9.50MB 2.40% crypto/tls.(*Config).Clone
-8MB 2.02% 85.60% 15.53MB 3.92% crypto/tls.dial
-5.50MB 1.39% 84.21% -5.50MB 1.39% net.(*netFD).connect
-5MB 1.26% 82.94% -5.50MB 1.39% context.WithDeadline
-4.50MB 1.14% 81.81% -11.50MB 2.91% net.(*Dialer).DialContext
3.50MB 0.88% 82.69% 3.50MB 0.88% net.sockaddrToTCP
-3MB 0.76% 81.93% -3MB 0.76% time.AfterFunc
2MB 0.51% 82.44% 17.53MB 4.43% main.serverCert
1.50MB 0.38% 82.82% 2MB 0.51% crypto/tls.(*cipherSuiteTLS13).expandLabel
1MB 0.25% 83.07% 18MB 4.55% crypto/tls.aeadAESGCM
1MB 0.25% 83.32% 10MB 2.53% crypto/tls.aeadAESGCMTLS13
0.50MB 0.13% 83.45% 201.93MB 51.02% crypto/tls.(*Conn).readRecordOrCCS
-0.50MB 0.13% 83.32% -2MB 0.51% net.(*sysDialer).dialSingle
0 0% 83.32% 118.09MB 29.84% bytes.(*Buffer).Grow (inline)
0 0% 83.32% 81.33MB 20.55% bytes.(*Buffer).Write
0 0% 83.32% 199.43MB 50.39% bytes.(*Buffer).grow
0 0% 83.32% 11MB 2.78% crypto/aes.NewCipher
0 0% 83.32% 18.01MB 4.55% crypto/cipher.NewGCM (inline)
0 0% 83.32% 18.01MB 4.55% crypto/cipher.newGCMWithNonceAndTagSize
0 0% 83.32% 318.24MB 80.41% crypto/tls.(*Conn).Handshake
0 0% 83.32% 318.24MB 80.41% crypto/tls.(*Conn).clientHandshake
0 0% 83.32% 3.01MB 0.76% crypto/tls.(*Conn).readChangeCipherSpec (inline)
0 0% 83.32% 118.09MB 29.84% crypto/tls.(*Conn).readFromUntil
0 0% 83.32% 198.92MB 50.26% crypto/tls.(*Conn).readRecord (inline)
0 0% 83.32% 11.58MB 2.93% crypto/tls.(*Conn).retryReadRecord
0 0% 83.32% 154.61MB 39.06% crypto/tls.(*clientHandshakeState).doFullHandshake
0 0% 83.32% 22.51MB 5.69% crypto/tls.(*clientHandshakeState).establishKeys
0 0% 83.32% 180.12MB 45.51% crypto/tls.(*clientHandshakeState).handshake
0 0% 83.32% 3.01MB 0.76% crypto/tls.(*clientHandshakeState).readFinished
0 0% 83.32% 12MB 3.03% crypto/tls.(*clientHandshakeStateTLS13).establishHandshakeKeys
0 0% 83.32% 117.50MB 29.69% crypto/tls.(*clientHandshakeStateTLS13).handshake
0 0% 83.32% 92.92MB 23.48% crypto/tls.(*clientHandshakeStateTLS13).readServerCertificate
0 0% 83.32% 11.58MB 2.93% crypto/tls.(*clientHandshakeStateTLS13).readServerParameters
0 0% 83.32% 10.50MB 2.65% crypto/tls.(*halfConn).setTrafficSecret
0 0% 83.32% 15.53MB 3.92% crypto/tls.DialWithDialer (inline)
0 0% 83.32% 3MB 0.76% crypto/tls.cipherAES
0 0% 83.32% 318.24MB 80.41% crypto/tls.dial.func2
0 0% 83.32% 17.53MB 4.43% main.main.func3
0 0% 83.32% -2MB 0.51% net.(*sysDialer).dialSerial
0 0% 83.32% -2MB 0.51% net.internetSocket
0 0% 83.32% -2MB 0.51% net.socket
这是相同数据的火焰图:
最大的违规者是 bytes.makeSlice,它在握手读取期间被调用。这 可能 意味着每次 goroutines 创建一个新的 tls.DialWithDialer
以连接到 URL 时,缓冲区都会被保留。这会让我感到惊讶,因为我希望 Close()
方法能够驱逐那些缓冲区。
原来 //do something with connection
中的代码比我想象的更重要。即使在 tls.Dial 级别,您也必须阅读“正文”。我现在明显错误的假设是,tls.Dial 只是建立了连接,并且由于 GET / HTTP 1.1
请求尚未发送,因此不需要从线路上读取数据。这导致所有那些充满服务器响应的缓冲区闲置。
_, _= ioutil.ReadAll(tConn)
在一行中修复了所有问题。我觉得自己更聪明了,同时也变得愚蠢了。作为旁注,在此级别,如果服务器响应缓慢,ReadAll()
可能会挂起很长时间。 tConn.SetReadDeadline(time.Now().Add(time.Second * timeout))
也解决了这个问题。