获取 java.net.SocketTimeoutException:AWS CodeBuild 期间连接超时
Getting java.net.SocketTimeoutException: connect timed out during AWS CodeBuild
在 AWS CodeBuild 的验收测试期间,我们能够向管道请求一个 .jar,但调用 .jar 的命令无法执行(URL 在此示例中修改了 IP,用于混淆用途):
[Container] 2020/07/08 14:53:37 Running command java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com"
Skipping HTTPS certificate checks altogether. Note that this is not secure at all.
java.net.SocketTimeoutException: connect timed out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at hudson.cli.CLI.connectViaCliPort(CLI.java:210)
at hudson.cli.CLI.<init>(CLI.java:128)
at hudson.cli.CLIConnectionFactory.connect(CLIConnectionFactory.java:72)
at hudson.cli.CLI._main(CLI.java:479)
at hudson.cli.CLI.main(CLI.java:390)
Suppressed: java.io.EOFException: unexpected stream termination
at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:331)
at hudson.remoting.Channel.<init>(Channel.java:422)
at hudson.remoting.Channel.<init>(Channel.java:401)
at hudson.remoting.Channel.<init>(Channel.java:397)
at hudson.remoting.Channel.<init>(Channel.java:386)
at hudson.remoting.Channel.<init>(Channel.java:378)
at hudson.remoting.Channel.<init>(Channel.java:354)
at hudson.cli.CLI.connectViaHttp(CLI.java:159)
at hudson.cli.CLI.<init>(CLI.java:132)
... 3 more
[Container] 2020/07/08 14:54:01 Command did not exit successfully java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com" exit status 255
[Container] 2020/07/08 14:54:01 Phase complete: PRE_BUILD State: FAILED
[Container] 2020/07/08 14:54:01 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com". Reason: exit status 255
这是 app-test-buildspec.yml(wget
作品):
# builld spec version. keep at 0.2
# https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html#build-spec-ref-versions
version: 0.2
phases:
pre_build:
commands:
#- echo "Installing jq (JSON parser)..."
#- yum install -y jq gettext
- echo "deploy_phase=${deploy_phase} developer_prefix=${developer_prefix} environment=${environment} account_id=${account_id} account_alias=${account_alias}"
- $(cat version.json | jq -j '"export app_name=\(.app_name) app_version=\(.app_version) s3_version=\(.s3_version)"')
- echo "app_name=${app_name} app_version=${app_version} s3_version=${s3_version} developer_prefix=${developer_prefix} environment=${environment}"
- $(cat app-deploy.json | jq -j '"export UseFargate=\(.Parameters.UseFargate)"')
- echo "UseFargate=${UseFargate}"
- wget https://example.com/jenkins/jenkins-cli.jar -O qa-jenkins-cli.jar
- java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com"
build:
commands:
- pip install boto3 pytest
- pytest -o log_cli=true -o log_cli_level=INFO -v tests/test_ecs_cluster.py
artifacts:
files:
- '**/*'
我们设置了 DNS 镜像,以便某些 AWS 进程可以访问本地服务,例如我们在这里尝试 运行 的测试套件。由于镜像,测试在 VPC 中进行 运行。我们知道镜像正在运行,因为我们可以执行检索 .jar 文件的 wget
。我们在任何地方的流日志中都看不到这个调用。
有人知道这里发生了什么吗?
我们发现测试 .jar 文件正试图在另一个具有防火墙设置的本地设备上执行测试,导致命令请求在该防火墙处被丢弃,除了超时响应之外别无其他。
经验教训 - 如果您要 运行 结合使用 AWS 和本地资源的混合系统,您必须确切知道需要哪些资源以及它们位于何处。在大型系统中,过程文档可能不准确或不存在。您必须拥有出色的工具来追踪问题发生的时间点(WireShark 是这里的救星),这样您才能了解如何补救。
在 AWS CodeBuild 的验收测试期间,我们能够向管道请求一个 .jar,但调用 .jar 的命令无法执行(URL 在此示例中修改了 IP,用于混淆用途):
[Container] 2020/07/08 14:53:37 Running command java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com"
Skipping HTTPS certificate checks altogether. Note that this is not secure at all.
java.net.SocketTimeoutException: connect timed out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at hudson.cli.CLI.connectViaCliPort(CLI.java:210)
at hudson.cli.CLI.<init>(CLI.java:128)
at hudson.cli.CLIConnectionFactory.connect(CLIConnectionFactory.java:72)
at hudson.cli.CLI._main(CLI.java:479)
at hudson.cli.CLI.main(CLI.java:390)
Suppressed: java.io.EOFException: unexpected stream termination
at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:331)
at hudson.remoting.Channel.<init>(Channel.java:422)
at hudson.remoting.Channel.<init>(Channel.java:401)
at hudson.remoting.Channel.<init>(Channel.java:397)
at hudson.remoting.Channel.<init>(Channel.java:386)
at hudson.remoting.Channel.<init>(Channel.java:378)
at hudson.remoting.Channel.<init>(Channel.java:354)
at hudson.cli.CLI.connectViaHttp(CLI.java:159)
at hudson.cli.CLI.<init>(CLI.java:132)
... 3 more
[Container] 2020/07/08 14:54:01 Command did not exit successfully java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com" exit status 255
[Container] 2020/07/08 14:54:01 Phase complete: PRE_BUILD State: FAILED
[Container] 2020/07/08 14:54:01 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com". Reason: exit status 255
这是 app-test-buildspec.yml(wget
作品):
# builld spec version. keep at 0.2
# https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html#build-spec-ref-versions
version: 0.2
phases:
pre_build:
commands:
#- echo "Installing jq (JSON parser)..."
#- yum install -y jq gettext
- echo "deploy_phase=${deploy_phase} developer_prefix=${developer_prefix} environment=${environment} account_id=${account_id} account_alias=${account_alias}"
- $(cat version.json | jq -j '"export app_name=\(.app_name) app_version=\(.app_version) s3_version=\(.s3_version)"')
- echo "app_name=${app_name} app_version=${app_version} s3_version=${s3_version} developer_prefix=${developer_prefix} environment=${environment}"
- $(cat app-deploy.json | jq -j '"export UseFargate=\(.Parameters.UseFargate)"')
- echo "UseFargate=${UseFargate}"
- wget https://example.com/jenkins/jenkins-cli.jar -O qa-jenkins-cli.jar
- java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com"
build:
commands:
- pip install boto3 pytest
- pytest -o log_cli=true -o log_cli_level=INFO -v tests/test_ecs_cluster.py
artifacts:
files:
- '**/*'
我们设置了 DNS 镜像,以便某些 AWS 进程可以访问本地服务,例如我们在这里尝试 运行 的测试套件。由于镜像,测试在 VPC 中进行 运行。我们知道镜像正在运行,因为我们可以执行检索 .jar 文件的 wget
。我们在任何地方的流日志中都看不到这个调用。
有人知道这里发生了什么吗?
我们发现测试 .jar 文件正试图在另一个具有防火墙设置的本地设备上执行测试,导致命令请求在该防火墙处被丢弃,除了超时响应之外别无其他。
经验教训 - 如果您要 运行 结合使用 AWS 和本地资源的混合系统,您必须确切知道需要哪些资源以及它们位于何处。在大型系统中,过程文档可能不准确或不存在。您必须拥有出色的工具来追踪问题发生的时间点(WireShark 是这里的救星),这样您才能了解如何补救。