使用 Akka HTTP 客户端和 Akka Streams 的死信
Dead letters using Akka HTTP client and Akka Streams
我正在尝试使用 Akka HTTP 和 Akka Streams 运行 一个抓取工具。我从一堆索引页面开始,从中解析 links,然后获取每个 link 并解析该页面,return 一堆单独的 link秒。所以,像这样:
fetch-top-level-page -> list-of-links-to-child-pages -> fetch-child-page -> list-of-links-in-child-page
我的问题是我什至无法获取单个页面。每个顶级 URL 我都尝试在死信中获取结果,但没有任何东西使它更进一步。
在此示例代码中,我要做的就是将 HttpRequest
发送到池中以转换为 HttpResponse
,并通过将内容打印到屏幕来证明它有效。
implicit val system = ActorSystem("scraper")
implicit val ec = system.dispatcher
implicit val settings = system.settings
implicit val materializer = ActorMaterializer()
val requests = List(HttpRequest(...), HttpRequest(...))
val poolClientFlow = Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10))
Source(requests)
.map (req => { println("-", req); req}) // this part runs fine
.via(poolClientFlow)
.map(resp => {println("|", resp); resp}) // this never runs
.toMat(Sink.foreach { p => println(p) })(Keep.both)
.run()
这是我得到的:
(-,(HttpRequest(...),Future(<not completed>)))
(-,(HttpRequest(...),Future(<not completed>)))
[INFO] [03/07/2018 15:10:03.681] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [03/07/2018 15:10:03.683] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
我是 Akka 的新手,显然犯了一些基本错误,因为这似乎是 Akka、Akka Streams 和 Akka HTTP 的确切用例。
有什么想法吗?
无法使用 Akka HTTP 10.1.0-RC2 和 Akka Streams 2.5.11 进行重现。以下作品:
val requests = List((HttpRequest(uri = "http://akka.io"), Promise[HttpResponse]()),
(HttpRequest(uri = "http://www.yahoo.com"), Promise[HttpResponse]()))
val poolClientFlow =
Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10)
Source(requests)
.map { req => println("-", req); req }
.via(poolClientFlow)
.map { resp => println("|", resp); resp }
.toMat(Sink.foreach(println))(Keep.both)
.run()
// The following is printed:
// (-,(HttpRequest(HttpMethod(GET),http://akka.io,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (-,(HttpRequest(HttpMethod(GET),http://www.yahoo.com,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:33:46 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f8604d386979cf6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future()))
// (Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:33:46 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f8604d386979cf6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future())
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp6.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future()))
// (Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp6.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future())
// [WARN] [03/08/2018 14:46:31.003] [scraper-akka.actor.default-dispatcher-4] [scraper/Pool(shared->http://akka.io:80)] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or call `discardBytes()` on it. GET / Empty -> 301 Moved Permanently Chunked
可能更好的方法是这样的(注意对 discardEntityBytes()
的调用):
Source(requests)
.map { req => println("-", req); req }
.via(poolClientFlow)
.map { resp => println("|", resp); resp }
.toMat(Sink.foreach({
case ((util.Success(resp), p)) =>
resp.discardEntityBytes()
p.success(resp)
case ((util.Failure(e), p)) => p.failure(e)
}))(Keep.both)
.run()
// The following is printed:
// (-,(HttpRequest(HttpMethod(GET),http://akka.io,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (-,(HttpRequest(HttpMethod(GET),http://www.yahoo.com,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:38:32 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp21.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:38:32 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:38:32 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f860bca84a32ba6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future()))
我正在尝试使用 Akka HTTP 和 Akka Streams 运行 一个抓取工具。我从一堆索引页面开始,从中解析 links,然后获取每个 link 并解析该页面,return 一堆单独的 link秒。所以,像这样:
fetch-top-level-page -> list-of-links-to-child-pages -> fetch-child-page -> list-of-links-in-child-page
我的问题是我什至无法获取单个页面。每个顶级 URL 我都尝试在死信中获取结果,但没有任何东西使它更进一步。
在此示例代码中,我要做的就是将 HttpRequest
发送到池中以转换为 HttpResponse
,并通过将内容打印到屏幕来证明它有效。
implicit val system = ActorSystem("scraper")
implicit val ec = system.dispatcher
implicit val settings = system.settings
implicit val materializer = ActorMaterializer()
val requests = List(HttpRequest(...), HttpRequest(...))
val poolClientFlow = Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10))
Source(requests)
.map (req => { println("-", req); req}) // this part runs fine
.via(poolClientFlow)
.map(resp => {println("|", resp); resp}) // this never runs
.toMat(Sink.foreach { p => println(p) })(Keep.both)
.run()
这是我得到的:
(-,(HttpRequest(...),Future(<not completed>)))
(-,(HttpRequest(...),Future(<not completed>)))
[INFO] [03/07/2018 15:10:03.681] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [03/07/2018 15:10:03.683] [scraper-akka.actor.default-dispatcher-5] [akka://scraper/user/pool-master] Message [akka.http.impl.engine.client.PoolMasterActor$SendRequest] without sender to Actor[akka://scraper/user/pool-master#1333123700] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
我是 Akka 的新手,显然犯了一些基本错误,因为这似乎是 Akka、Akka Streams 和 Akka HTTP 的确切用例。
有什么想法吗?
无法使用 Akka HTTP 10.1.0-RC2 和 Akka Streams 2.5.11 进行重现。以下作品:
val requests = List((HttpRequest(uri = "http://akka.io"), Promise[HttpResponse]()),
(HttpRequest(uri = "http://www.yahoo.com"), Promise[HttpResponse]()))
val poolClientFlow =
Http().superPool[Promise[HttpResponse]](settings = ConnectionPoolSettings(system).withMaxConnections(10)
Source(requests)
.map { req => println("-", req); req }
.via(poolClientFlow)
.map { resp => println("|", resp); resp }
.toMat(Sink.foreach(println))(Keep.both)
.run()
// The following is printed:
// (-,(HttpRequest(HttpMethod(GET),http://akka.io,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (-,(HttpRequest(HttpMethod(GET),http://www.yahoo.com,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:33:46 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f8604d386979cf6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future()))
// (Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:33:46 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f8604d386979cf6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future())
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp6.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future()))
// (Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:33:46 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp6.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future())
// [WARN] [03/08/2018 14:46:31.003] [scraper-akka.actor.default-dispatcher-4] [scraper/Pool(shared->http://akka.io:80)] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or call `discardBytes()` on it. GET / Empty -> 301 Moved Permanently Chunked
可能更好的方法是这样的(注意对 discardEntityBytes()
的调用):
Source(requests)
.map { req => println("-", req); req }
.via(poolClientFlow)
.map { resp => println("|", resp); resp }
.toMat(Sink.foreach({
case ((util.Success(resp), p)) =>
resp.discardEntityBytes()
p.success(resp)
case ((util.Failure(e), p)) => p.failure(e)
}))(Keep.both)
.run()
// The following is printed:
// (-,(HttpRequest(HttpMethod(GET),http://akka.io,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (-,(HttpRequest(HttpMethod(GET),http://www.yahoo.com,List(),HttpEntity.Strict(none/none,ByteString()),HttpProtocol(HTTP/1.1)),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:38:32 GMT, Connection: keep-alive, Via: http/1.1 media-router-fp21.prod.media.ir2.yahoo.com (ApacheTrafficServer [c s f ]), Server: ATS, Cache-Control: no-store, no-cache, Content-Language: en, X-Frame-Options: SAMEORIGIN, Location: https://www.yahoo.com/),HttpEntity.Strict(text/html,ByteString(114, 101, 100, 105, 114, 101, 99, 116)),HttpProtocol(HTTP/1.1))),Future()))
// (|,(Success(HttpResponse(301 Moved Permanently,List(Date: Thu, 08 Mar 2018 14:38:32 GMT, Connection: keep-alive, Cache-Control: max-age=3600, Expires: Thu, 08 Mar 2018 15:38:32 GMT, Location: https://akka.io/, Server: cloudflare, CF-RAY: 3f860bca84a32ba6-AMS),HttpEntity.Chunked(application/octet-stream),HttpProtocol(HTTP/1.1))),Future()))