如何为实时数据流配置 Apache Flink Cluster (flink-conf.yml)

How to configure Apache Flink Cluster (flink-conf.yml) for real time data stream

请帮帮我, 我有一个 Apache Flink 集群(2 个作业管理器,3 个任务管理器),但我不知道在 flink-conf.yml:

中为该参数设置哪些值

jobmanager.heap.size

taskmanager.heap.size

taskmanager.numberOfTaskSlots

parallelism.default

Job Manager 机器有:8CPU,32GB RAM
任务管理器机器有:8CPU,32GB RAM

我计划 运行 在此集群上执行 15..20 个 Apache Flink 作业。由于隐私政策,我不能在这里写java代码,所以我会尽量用文字来表达。

预计每天将有超过5000万个事件到来。所有职位都将有一个数据源。

我会考虑使用资源管理器来点赞YARN, Mesos, or Kubernetes in order to have high availability. In a nutshell, this is what they do for you:

When deploying a Flink application, Flink automatically identifies the required resources based on the application’s configured parallelism and requests them from the resource manager. In case of a failure, Flink replaces the failed container by requesting new resources. All communication to submit or control an application happens via REST calls. This eases the integration of Flink in many environments.

换句话说,他们可以将需要的集群资源提供给link引擎。并且您可以更轻松地配置您要查找的参数。