为什么 KafkaUtils.createStream() 的 "topics" 参数是 Map 而不是数组?

Why is the "topics" argument of KafkaUtils.createStream() a Map rather then array?

文档中的定义:

org.apache.spark.streaming.kafka

Class KafkaUtils

static JavaPairReceiverInputDStream<String,String> createStream(JavaStreamingContext jssc, String zkQuorum, String groupId, java.util.Map<String,Integer> topics)

Create an input stream that pulls messages from Kafka Brokers.

为什么主题是一个映射(而不是字符串数组)?

我了解到字符串键是主题名称。但是整数值呢?我应该填写什么?

阅读 Javadoc:

public static JavaPairReceiverInputDStream createStream(JavaStreamingContext jssc, String zkQuorum, String groupId, java.util.Map topics)

Create an input stream that pulls messages from Kafka Brokers. Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2.

Parameters: jssc - JavaStreamingContext object

zkQuorum - Zookeeper quorum (hostname:port,hostname:port,..)

groupId - The group id for this consumer

topics - Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread

Returns: DStream of (Kafka message key, Kafka message value)

Map的值是给定主题名称的分区数,它决定了将用于消费该主题的线程数。

来自 Javadoc:https://spark.apache.org/docs/1.3.0/api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html

topics - 要使用的 (topic_name -> numPartitions) 的映射。每个分区都在自己的线程中使用

所以每个数字都是您要用于该主题的分区数

如果您查看 KafkaUtils herecreateStream 方法的文档,您会看到

topics - Map of (topic_name -> numPartitions) to consume. Each partition is consumed in its own thread

整数值是作为映射中键的一部分的主题的分区数