JobManager 不会自动将所有请求重定向到剩余的 / 运行 TaskManager
JobManager doesn't automatically redirect all requests to the remaining / running TaskManager
问题描述
- 2 台计算机(203,204)
- 创建了一个 Standalone 模式的 HA Flink v1.6.1 集群
- 每台计算机上 运行 jobmanager 和 taskmanager(2 个任务槽)
- 在 JobManager 节点上启动作业(示例 SocketWindowWordCount.jar
./flink run ../examples/streaming/SocketWindowWordCount.jar --hostname 10.1.2.9 --port 9000
)后,我终止了正在工作的 TaskManager 实例。
- Web 仪表板我可以看到作业被取消然后失败。 Web Dashboard image
flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: hdfs://10.1.2.109:8020/wulin/flink-checkpoints
rest.port: 9081
blob.server.port: 6124
query.server.port: 6125
web.tmpdir: /home/flink/deploy/webTmp
web.log.path: /home/flink/deploy/log
io.tmp.dirs: /home/flink/deploy/taskManagerTmp
high-availability: zookeeper
high-availability.zookeeper.quorum: 10.0.1.79:2181
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: flink
high-availability.storageDir: hdfs://10.1.2.109:8020/wulin
security.kerberos.login.principal: xxxx
security.kerberos.login.keytab: /home/ctu/flink/flink-1.6/conf/user.keytab
完整日志
log-standalonesession-203
log-taskexecutor-203
log-standalonesession-204
异常
kill working TM, get excpetion like this
2018-12-28 11:04:27,877 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@hz203:42861] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@hz203:42861]] Caused by: [Connection refused: hz203/10.0.0.203:42861]
2018-12-28 11:04:28,660 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: hz203/10.0.0.203:42861
2018-12-28 11:04:28,660 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@hz203:42861] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@hz203:42861]] Caused by: [Connection refused: hz203/10.0.0.203:42861]
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - The heartbeat of TaskManager with id 0f41bca09600cd25000e19801076fa1f timed out.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Closing TaskExecutor connection 0f41bca09600cd25000e19801076fa1f because: The heartbeat of TaskManager with id 0f41bca09600cd25000e19801076fa1f timed out.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Unregister TaskManager dcf3bb5b7ed2208cf45b658d212fd8d2 from the SlotManager.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (88aa62ad152f4df6b39a969dd32c0249) switched from RUNNING to FAILED.
org.apache.flink.util.FlinkException: The assigned slot 0f41bca09600cd25000e19801076fa1f_0 was removed.
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372)
at org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:803)
at org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.run(ResourceManager.java:1116)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at akka.actor.UntypedActor$$anonfun$receive.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-12-28 11:04:28,680 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window WordCount (61f55876e79934d515c163d095d706a6) switched from state RUNNING to FAILING.
提交作业
运行 ./bin/flink run -d ./examples/streaming/SocketWindowWordCount.jar --port 9000 --hostname 10.1.2.9
, 像这样获取 JM 日志
2018-12-28 19:20:01,354 INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job Socket Window WordCount (5cdb91c15ee12ec6e74256eed10b5291)
2018-12-28 19:20:01,354 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window WordCount (5cdb91c15ee12ec6e74256eed10b5291) switched from state CREATED to RUNNING.
2018-12-28 19:20:01,356 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from CREATED to SCHEDULED.
2018-12-28 19:20:01,359 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from CREATED to SCHEDULED.
2018-12-28 19:20:01,364 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{e33a40832a3922897470fb76bcf76b29}]
2018-12-28 19:20:01,367 INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager akka.tcp://flink@hz203:46596/user/resourcemanager(b22f96303e74df23645fe4567f884b9e)
2018-12-28 19:20:01,370 INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration
2018-12-28 19:20:01,370 INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms)
2018-12-28 19:20:01,371 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/5cdb91c15ee12ec6e74256eed10b5291/job_manager_lock.
2018-12-28 19:20:01,371 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering job manager 9a31e8b4e8dfbf7b31d6ed3d227648b6@akka.tcp://flink@hz203:46596/user/jobmanager_0 for job 5cdb91c15ee12ec6e74256eed10b5291.
2018-12-28 19:20:01,431 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registered job manager 9a31e8b4e8dfbf7b31d6ed3d227648b6@akka.tcp://flink@hz203:46596/user/jobmanager_0 for job 5cdb91c15ee12ec6e74256eed10b5291.
2018-12-28 19:20:01,432 INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: b22f96303e74df23645fe4567f884b9e.
2018-12-28 19:20:01,433 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Requesting new slot [SlotRequestId{e33a40832a3922897470fb76bcf76b29}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager.
2018-12-28 19:20:01,434 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 5cdb91c15ee12ec6e74256eed10b5291 with allocation id AllocationID{f7a24e609e2ec618ccb456076049fa3b}.
2018-12-28 19:20:01,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from SCHEDULED to DEPLOYING.
2018-12-28 19:20:01,511 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: Socket Stream -> Flat Map (1/1) (attempt #0) to hz203
2018-12-28 19:20:01,515 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from SCHEDULED to DEPLOYING.
2018-12-28 19:20:01,515 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (attempt #0) to hz203
2018-12-28 19:20:01,674 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from DEPLOYING to RUNNING.
2018-12-28 19:20:01,708 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from DEPLOYING to RUNNING.
2018-12-28 19:20:43,267 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-61808afb630553305c73a0a23f9231ffd6b2b448-513fbe1e6ddf69d10689eccf4c65da97 from hz203/10.0.0.203:6124
2018-12-28 19:20:48,339 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-dd915bb9821ff6ced34dd5e489966b674de5a48f-7ea2600930e5fc5a4fbb7d47ee198789 from hz203/10.0.0.203:6124
2018-12-28 19:20:52,623 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-61808afb630553305c73a0a23f9231ffd6b2b448-0bd1ab86fa4cc54daeb472079bfbea8c from hz203/10.0.0.203:6124
杀TM
正文限制为 30000 个字符。 kill TM
的时候请看这篇JMlogs
日志表明您的 RestartStrategy
已耗尽其重新启动尝试或未配置 RestartStrategy
。请检查您是否在程序中通过 env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 0L))
或在 flink-conf.yaml
中通过 restart-strategy: fixed-delay
指定了 RestartStrategy
。如果你想了解更多关于 Flink 的重启策略,请查看 documentation.
问题描述
- 2 台计算机(203,204)
- 创建了一个 Standalone 模式的 HA Flink v1.6.1 集群
- 每台计算机上 运行 jobmanager 和 taskmanager(2 个任务槽)
- 在 JobManager 节点上启动作业(示例 SocketWindowWordCount.jar
./flink run ../examples/streaming/SocketWindowWordCount.jar --hostname 10.1.2.9 --port 9000
)后,我终止了正在工作的 TaskManager 实例。 - Web 仪表板我可以看到作业被取消然后失败。 Web Dashboard image
flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: hdfs://10.1.2.109:8020/wulin/flink-checkpoints
rest.port: 9081
blob.server.port: 6124
query.server.port: 6125
web.tmpdir: /home/flink/deploy/webTmp
web.log.path: /home/flink/deploy/log
io.tmp.dirs: /home/flink/deploy/taskManagerTmp
high-availability: zookeeper
high-availability.zookeeper.quorum: 10.0.1.79:2181
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: flink
high-availability.storageDir: hdfs://10.1.2.109:8020/wulin
security.kerberos.login.principal: xxxx
security.kerberos.login.keytab: /home/ctu/flink/flink-1.6/conf/user.keytab
完整日志
log-standalonesession-203
log-taskexecutor-203
log-standalonesession-204
异常
kill working TM, get excpetion like this
2018-12-28 11:04:27,877 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@hz203:42861] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@hz203:42861]] Caused by: [Connection refused: hz203/10.0.0.203:42861]
2018-12-28 11:04:28,660 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: hz203/10.0.0.203:42861
2018-12-28 11:04:28,660 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@hz203:42861] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@hz203:42861]] Caused by: [Connection refused: hz203/10.0.0.203:42861]
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - The heartbeat of TaskManager with id 0f41bca09600cd25000e19801076fa1f timed out.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Closing TaskExecutor connection 0f41bca09600cd25000e19801076fa1f because: The heartbeat of TaskManager with id 0f41bca09600cd25000e19801076fa1f timed out.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Unregister TaskManager dcf3bb5b7ed2208cf45b658d212fd8d2 from the SlotManager.
2018-12-28 11:04:28,678 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (88aa62ad152f4df6b39a969dd32c0249) switched from RUNNING to FAILED.
org.apache.flink.util.FlinkException: The assigned slot 0f41bca09600cd25000e19801076fa1f_0 was removed.
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948)
at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372)
at org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:803)
at org.apache.flink.runtime.resourcemanager.ResourceManager$TaskManagerHeartbeatListener.run(ResourceManager.java:1116)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at akka.actor.UntypedActor$$anonfun$receive.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-12-28 11:04:28,680 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window WordCount (61f55876e79934d515c163d095d706a6) switched from state RUNNING to FAILING.
提交作业
运行 ./bin/flink run -d ./examples/streaming/SocketWindowWordCount.jar --port 9000 --hostname 10.1.2.9
, 像这样获取 JM 日志
2018-12-28 19:20:01,354 INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job Socket Window WordCount (5cdb91c15ee12ec6e74256eed10b5291)
2018-12-28 19:20:01,354 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window WordCount (5cdb91c15ee12ec6e74256eed10b5291) switched from state CREATED to RUNNING.
2018-12-28 19:20:01,356 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from CREATED to SCHEDULED.
2018-12-28 19:20:01,359 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from CREATED to SCHEDULED.
2018-12-28 19:20:01,364 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{e33a40832a3922897470fb76bcf76b29}]
2018-12-28 19:20:01,367 INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager akka.tcp://flink@hz203:46596/user/resourcemanager(b22f96303e74df23645fe4567f884b9e)
2018-12-28 19:20:01,370 INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration
2018-12-28 19:20:01,370 INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms)
2018-12-28 19:20:01,371 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/5cdb91c15ee12ec6e74256eed10b5291/job_manager_lock.
2018-12-28 19:20:01,371 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registering job manager 9a31e8b4e8dfbf7b31d6ed3d227648b6@akka.tcp://flink@hz203:46596/user/jobmanager_0 for job 5cdb91c15ee12ec6e74256eed10b5291.
2018-12-28 19:20:01,431 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Registered job manager 9a31e8b4e8dfbf7b31d6ed3d227648b6@akka.tcp://flink@hz203:46596/user/jobmanager_0 for job 5cdb91c15ee12ec6e74256eed10b5291.
2018-12-28 19:20:01,432 INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: b22f96303e74df23645fe4567f884b9e.
2018-12-28 19:20:01,433 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Requesting new slot [SlotRequestId{e33a40832a3922897470fb76bcf76b29}] and profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} from resource manager.
2018-12-28 19:20:01,434 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 5cdb91c15ee12ec6e74256eed10b5291 with allocation id AllocationID{f7a24e609e2ec618ccb456076049fa3b}.
2018-12-28 19:20:01,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from SCHEDULED to DEPLOYING.
2018-12-28 19:20:01,511 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: Socket Stream -> Flat Map (1/1) (attempt #0) to hz203
2018-12-28 19:20:01,515 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from SCHEDULED to DEPLOYING.
2018-12-28 19:20:01,515 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (attempt #0) to hz203
2018-12-28 19:20:01,674 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (102d04f5aa6fc50cfe5088e20902c72e) switched from DEPLOYING to RUNNING.
2018-12-28 19:20:01,708 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (e30439b9f548c6013d8b8689e30d0dd7) switched from DEPLOYING to RUNNING.
2018-12-28 19:20:43,267 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-61808afb630553305c73a0a23f9231ffd6b2b448-513fbe1e6ddf69d10689eccf4c65da97 from hz203/10.0.0.203:6124
2018-12-28 19:20:48,339 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-dd915bb9821ff6ced34dd5e489966b674de5a48f-7ea2600930e5fc5a4fbb7d47ee198789 from hz203/10.0.0.203:6124
2018-12-28 19:20:52,623 INFO org.apache.flink.runtime.blob.BlobClient - Downloading null/t-61808afb630553305c73a0a23f9231ffd6b2b448-0bd1ab86fa4cc54daeb472079bfbea8c from hz203/10.0.0.203:6124
杀TM
正文限制为 30000 个字符。 kill TM
的时候请看这篇JMlogs日志表明您的 RestartStrategy
已耗尽其重新启动尝试或未配置 RestartStrategy
。请检查您是否在程序中通过 env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 0L))
或在 flink-conf.yaml
中通过 restart-strategy: fixed-delay
指定了 RestartStrategy
。如果你想了解更多关于 Flink 的重启策略,请查看 documentation.