使用 StormCrawler 抓取某些 url 时出现 X509 证书异常

X509 Certificate Exception while crawling some urls with StormCrawler

我一直在使用 StormCrawler 来抓取网站。作为 https 协议,我在 StormCrawler 中设置了默认的 https 协议。但是,当我抓取某些网站时,我收到以下异常:

Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) ~[?:1.8.0_131]
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) ~[?:1.8.0_131]
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) ~[?:1.8.0_131]
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382) ~[?:1.8.0_131]
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292) ~[?:1.8.0_131]
at sun.security.validator.Validator.validate(Validator.java:260) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) ~[?:1.8.0_131]
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496) ~[?:1.8.0_131]
... 20 more

有没有自动下载证书和设置爬虫的机制,我应该如何设置爬虫的配置?

此问题不是 StormCrawler 特有的。 This answer 解释说您可以手动导入证书,除非您专门抓取该站点,否则这不是一个真正的选择。另一种选择是禁用证书验证。这将需要修改协议实现,但应该是可行的。

您尝试过 OKHttp 实现吗?它的行为可能与 Apache HttClient 不同。参见 okhttp wiki