获取 java.net.SocketTimeoutException:AWS CodeBuild 期间连接超时

Getting java.net.SocketTimeoutException: connect timed out during AWS CodeBuild

在 AWS CodeBuild 的验收测试期间,我们能够向管道请求一个 .jar,但调用 .jar 的命令无法执行(URL 在此示例中修改了 IP,用于混淆用途):

[Container] 2020/07/08 14:53:37 Running command java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com"
Skipping HTTPS certificate checks altogether. Note that this is not secure at all.
java.net.SocketTimeoutException: connect timed out
    at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
    at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
    at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
    at java.base/java.net.Socket.connect(Socket.java:609)
    at hudson.cli.CLI.connectViaCliPort(CLI.java:210)
    at hudson.cli.CLI.<init>(CLI.java:128)
    at hudson.cli.CLIConnectionFactory.connect(CLIConnectionFactory.java:72)
    at hudson.cli.CLI._main(CLI.java:479)
    at hudson.cli.CLI.main(CLI.java:390)
    Suppressed: java.io.EOFException: unexpected stream termination
        at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:331)
        at hudson.remoting.Channel.<init>(Channel.java:422)
        at hudson.remoting.Channel.<init>(Channel.java:401)
        at hudson.remoting.Channel.<init>(Channel.java:397)
        at hudson.remoting.Channel.<init>(Channel.java:386)
        at hudson.remoting.Channel.<init>(Channel.java:378)
        at hudson.remoting.Channel.<init>(Channel.java:354)
        at hudson.cli.CLI.connectViaHttp(CLI.java:159)
        at hudson.cli.CLI.<init>(CLI.java:132)
        ... 3 more

[Container] 2020/07/08 14:54:01 Command did not exit successfully java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com" exit status 255
[Container] 2020/07/08 14:54:01 Phase complete: PRE_BUILD State: FAILED
[Container] 2020/07/08 14:54:01 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com". Reason: exit status 255

这是 app-test-buildspec.ymlwget 作品):

# builld spec version.  keep at 0.2
# https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html#build-spec-ref-versions
version: 0.2

phases:
  pre_build:
    commands:
      #- echo "Installing jq (JSON parser)..."
      #- yum install -y jq gettext
      - echo "deploy_phase=${deploy_phase} developer_prefix=${developer_prefix} environment=${environment} account_id=${account_id} account_alias=${account_alias}"
      - $(cat version.json | jq -j '"export app_name=\(.app_name) app_version=\(.app_version) s3_version=\(.s3_version)"')
      - echo "app_name=${app_name} app_version=${app_version} s3_version=${s3_version} developer_prefix=${developer_prefix} environment=${environment}"
      - $(cat app-deploy.json | jq -j '"export UseFargate=\(.Parameters.UseFargate)"')
      - echo "UseFargate=${UseFargate}"
      - wget https://example.com/jenkins/jenkins-cli.jar -O qa-jenkins-cli.jar
      - java -jar qa-jenkins-cli.jar -s https://example.com/jenkins/ -noCertificateCheck build RUN-l1-Regression -s -v -p ReasonForRun="AWS pipeline run" -p slavepool="DI" -p HOST_VALUES="127.0.0.1 sp.l1.example.com"
  build:
    commands:
      - pip install boto3 pytest
      - pytest -o log_cli=true -o log_cli_level=INFO -v tests/test_ecs_cluster.py

artifacts:
  files:
    - '**/*'

我们设置了 DNS 镜像,以便某些 AWS 进程可以访问本地服务,例如我们在这里尝试 运行 的测试套件。由于镜像,测试在 VPC 中进行 运行。我们知道镜像正在运行,因为我们可以执行检索 .jar 文件的 wget。我们在任何地方的流日志中都看不到这个调用。

有人知道这里发生了什么吗?

我们发现测试 .jar 文件正试图在另一个具有防火墙设置的本地设备上执行测试,导致命令请求在该防火墙处被丢弃,除了超时响应之外别无其他。

经验教训 - 如果您要 运行 结合使用 AWS 和本地资源的混合系统,您必须确切知道需要哪些资源以及它们位于何处。在大型系统中,过程文档可能不准确或不存在。您必须拥有出色的工具来追踪问题发生的时间点(WireShark 是这里的救星),这样您才能了解如何补救。