在 docker standalone flink 中启动任务时出现问题
Problem when starting tasks in docker standalone flink
我们在 Flink 中开发了一个系统,该系统从目录中读取文件,按客户端对它们进行分组,并根据信息类型将这些文件推送到接收器。
我们通过在我们的机器上本地安装 Flink 来做到这一点,并且它可以正常工作。但是,当我们 docker 化项目时,我们的作业已正确提交并且 UI 显示为 运行,但作业实际上从未启动过。在 UI 中,当我们进入细节时,它是这样显示的:
Flink dashboard screenshot
作业中的日志显示如下:
2021-10-15 12:13:50,247 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Running initialization on master for job Decibel Processor (d7ea5533f968c1d27e06f45c94e5823a).
2021-10-15 12:13:50,247 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully ran initialization on master in 0 ms.
2021-10-15 12:13:50,269 INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 1 ms
2021-10-15 12:13:50,279 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - No state backend has been configured, using default (HashMap) org.apache.flink.runtime.state.hashmap.HashMapStateBackend@3ae62e37
2021-10-15 12:13:50,280 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Checkpoint storage is set to 'jobmanager'
2021-10-15 12:13:50,296 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No checkpoint found during restore.
2021-10-15 12:13:50,304 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@3370d640 for Decibel Processor (d7ea5533f968c1d27e06f45c94e5823a).
2021-10-15 12:13:50,319 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Starting execution of job Decibel Processor (d7ea5533f968c1d27e06f45c94e5823a) under job master id 00000000000000000000000000000000.
2021-10-15 12:13:50,322 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
2021-10-15 12:13:50,323 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Decibel Processor (d7ea5533f968c1d27e06f45c94e5823a) switched from state CREATED to RUNNING.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Property Config Stream (1/1) (1577d0fa1357c72c49057d4cdd7e2ddd) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom File Source (1/1) (73a31f7f635a874d2f8df49e7184b83d) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Collection File Input -> Map -> Collection Messages -> Filter Collection Messages -> WaterMark Filter Message (1/1) (55cf6f5740276160f44cc651c0971af8) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Session Window -> Filter -> Map -> Sink: Unnamed (1/1) (0d5400e22e1a0b8491f0a98ffcf3ac12) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Keyed Reduce -> Stats output -> Sink: Unnamed (1/1) (f2fefa969c0f63f43c817e4c981b6edf) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,334 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Storage side output (1/1) (b3204e02a57a8bc0b04a63f044050fd6) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,334 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: In progress sessions side output (1/1) (9def69110da478c06dfc1636c76349aa) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,334 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: In progress page views side output (1/1) (cec4378c07fe32c56c9db45ae144079b) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,361 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Connecting to ResourceManager akka.tcp://flink@jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000)
2021-10-15 12:13:50,366 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Resolved ResourceManager address, beginning registration
2021-10-15 12:13:50,369 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering job manager 00000000000000000000000000000000@akka.tcp://flink@jobmanager:6123/user/rpc/jobmanager_2 for job d7ea5533f968c1d27e06f45c94e5823a.
2021-10-15 12:13:50,375 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager 00000000000000000000000000000000@akka.tcp://flink@jobmanager:6123/user/rpc/jobmanager_2 for job d7ea5533f968c1d27e06f45c94e5823a.
2021-10-15 12:13:50,378 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000.
2021-10-15 12:13:50,381 INFO org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Received resource requirements from job d7ea5533f968c1d27e06f45c94e5823a: [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, numberOfRequiredSlots=1}]
但是如果我们将一个文件添加到应该处理的文件夹中,则不会发生任何事情。
我们为此使用 docker-compose,因为它必须与与 Flink 无关的其他代码部分一起部署(我省略了不相关的部分)
short-term-processor:
# Define the name and version of the docker image
image: short-term-processor
container_name: short_term_processor
command: standalone-job --job-classname com.decibel.flink.processor.Main
restart: always
ports:
- "8081:8081"
volumes:
- /flink/flink-conf.yaml:/opt/flink/conf/flink-conf.yaml
- /flink/processor.properties:/opt/flink/conf/processor.properties
- /flink/data:/data1
配置 yaml 和属性与在我们的本地集群中正常工作的那些完全相同。
知道为什么会这样吗?
经过几次尝试,我们找到了问题和解决方案:独立 docker 作业只是提交作业但从未启动。
为了解决这个问题,我们需要创建 2 个额外的容器,一个用于作业管理器,一个用于任务管理器:
service:
# Define the name and version of the docker image
image: service
container_name: service
command: "flink run /opt/flink/usrlib/artifacts/service.jar --port 9000"
# Logs are pushed to syslog
logging:
driver: syslog
restart: always
ports:
- "9000:9000"
volumes:
- /etc/localtime:/etc/localtime:ro
- /var/log/decibel/:/opt/flink/log/
- /conf/flink-conf.yaml:/opt/flink/conf/flink-conf.yaml
- /conf/processor.properties:/opt/flink/conf/processor.properties
- /data1:/data1/decibelinsightdata
depends_on:
- jobmanager
- taskmanager
jobmanager:
image: apache/flink:1.13.0-scala_2.11-java11
command: "jobmanager.sh start-foreground"
ports:
- 8081:8081
volumes:
- /data1:/data1/decibelinsightdata
- /conf/flink-conf.yaml:/opt/flink/conf/flink-conf.yaml
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
taskmanager:
image: apache/flink:1.13.0-scala_2.11-java11
command: "taskmanager.sh start-foreground"
volumes:
- /data1:/data1/decibelinsightdata
- /conf/flink-conf.yaml:/opt/flink/conf/flink-conf.yaml
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
在此之后,作业开始没有任何问题
我们在 Flink 中开发了一个系统,该系统从目录中读取文件,按客户端对它们进行分组,并根据信息类型将这些文件推送到接收器。 我们通过在我们的机器上本地安装 Flink 来做到这一点,并且它可以正常工作。但是,当我们 docker 化项目时,我们的作业已正确提交并且 UI 显示为 运行,但作业实际上从未启动过。在 UI 中,当我们进入细节时,它是这样显示的: Flink dashboard screenshot
作业中的日志显示如下:
2021-10-15 12:13:50,247 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Running initialization on master for job Decibel Processor (d7ea5533f968c1d27e06f45c94e5823a).
2021-10-15 12:13:50,247 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully ran initialization on master in 0 ms.
2021-10-15 12:13:50,269 INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 1 ms
2021-10-15 12:13:50,279 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - No state backend has been configured, using default (HashMap) org.apache.flink.runtime.state.hashmap.HashMapStateBackend@3ae62e37
2021-10-15 12:13:50,280 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Checkpoint storage is set to 'jobmanager'
2021-10-15 12:13:50,296 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No checkpoint found during restore.
2021-10-15 12:13:50,304 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@3370d640 for Decibel Processor (d7ea5533f968c1d27e06f45c94e5823a).
2021-10-15 12:13:50,319 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Starting execution of job Decibel Processor (d7ea5533f968c1d27e06f45c94e5823a) under job master id 00000000000000000000000000000000.
2021-10-15 12:13:50,322 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
2021-10-15 12:13:50,323 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job Decibel Processor (d7ea5533f968c1d27e06f45c94e5823a) switched from state CREATED to RUNNING.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Property Config Stream (1/1) (1577d0fa1357c72c49057d4cdd7e2ddd) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: Custom File Source (1/1) (73a31f7f635a874d2f8df49e7184b83d) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Collection File Input -> Map -> Collection Messages -> Filter Collection Messages -> WaterMark Filter Message (1/1) (55cf6f5740276160f44cc651c0971af8) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Session Window -> Filter -> Map -> Sink: Unnamed (1/1) (0d5400e22e1a0b8491f0a98ffcf3ac12) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,333 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Keyed Reduce -> Stats output -> Sink: Unnamed (1/1) (f2fefa969c0f63f43c817e4c981b6edf) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,334 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: Storage side output (1/1) (b3204e02a57a8bc0b04a63f044050fd6) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,334 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: In progress sessions side output (1/1) (9def69110da478c06dfc1636c76349aa) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,334 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Sink: In progress page views side output (1/1) (cec4378c07fe32c56c9db45ae144079b) switched from CREATED to SCHEDULED.
2021-10-15 12:13:50,361 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Connecting to ResourceManager akka.tcp://flink@jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000)
2021-10-15 12:13:50,366 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Resolved ResourceManager address, beginning registration
2021-10-15 12:13:50,369 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering job manager 00000000000000000000000000000000@akka.tcp://flink@jobmanager:6123/user/rpc/jobmanager_2 for job d7ea5533f968c1d27e06f45c94e5823a.
2021-10-15 12:13:50,375 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager 00000000000000000000000000000000@akka.tcp://flink@jobmanager:6123/user/rpc/jobmanager_2 for job d7ea5533f968c1d27e06f45c94e5823a.
2021-10-15 12:13:50,378 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000.
2021-10-15 12:13:50,381 INFO org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Received resource requirements from job d7ea5533f968c1d27e06f45c94e5823a: [ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, numberOfRequiredSlots=1}]
但是如果我们将一个文件添加到应该处理的文件夹中,则不会发生任何事情。
我们为此使用 docker-compose,因为它必须与与 Flink 无关的其他代码部分一起部署(我省略了不相关的部分)
short-term-processor:
# Define the name and version of the docker image
image: short-term-processor
container_name: short_term_processor
command: standalone-job --job-classname com.decibel.flink.processor.Main
restart: always
ports:
- "8081:8081"
volumes:
- /flink/flink-conf.yaml:/opt/flink/conf/flink-conf.yaml
- /flink/processor.properties:/opt/flink/conf/processor.properties
- /flink/data:/data1
配置 yaml 和属性与在我们的本地集群中正常工作的那些完全相同。 知道为什么会这样吗?
经过几次尝试,我们找到了问题和解决方案:独立 docker 作业只是提交作业但从未启动。
为了解决这个问题,我们需要创建 2 个额外的容器,一个用于作业管理器,一个用于任务管理器:
service:
# Define the name and version of the docker image
image: service
container_name: service
command: "flink run /opt/flink/usrlib/artifacts/service.jar --port 9000"
# Logs are pushed to syslog
logging:
driver: syslog
restart: always
ports:
- "9000:9000"
volumes:
- /etc/localtime:/etc/localtime:ro
- /var/log/decibel/:/opt/flink/log/
- /conf/flink-conf.yaml:/opt/flink/conf/flink-conf.yaml
- /conf/processor.properties:/opt/flink/conf/processor.properties
- /data1:/data1/decibelinsightdata
depends_on:
- jobmanager
- taskmanager
jobmanager:
image: apache/flink:1.13.0-scala_2.11-java11
command: "jobmanager.sh start-foreground"
ports:
- 8081:8081
volumes:
- /data1:/data1/decibelinsightdata
- /conf/flink-conf.yaml:/opt/flink/conf/flink-conf.yaml
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
taskmanager:
image: apache/flink:1.13.0-scala_2.11-java11
command: "taskmanager.sh start-foreground"
volumes:
- /data1:/data1/decibelinsightdata
- /conf/flink-conf.yaml:/opt/flink/conf/flink-conf.yaml
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
在此之后,作业开始没有任何问题