如何订阅特定分区并从 Spark Structured Streaming 中的自定义偏移量读取?
How to subscribe to particular partition and read from custom offset in Spark Structured Streaming?
我有一个用例,其中多个表发布到相同的主题但不同的分区。我想单独从特定分区和自定义偏移量读取。
val data = sql.readStream.format("kafka")
.option("kafka.bootstrap.servers", "servers")
.option("assign", {"TEST1":[0]})
.option("startingOffsets",""" {"TEST1":{"0":172260244}} """)
.option("endingOffsets",""" {"TEST1":{"0":-1}} """)
.load()
所以,当我订阅它时,我收到了以下错误。主题名称正在自动转换为小写。
WARN org.apache.spark.sql.kafka010.KafkaSource - Error in attempt 1 getting Kafka offsets:
java.lang.AssertionError: assertion failed: If startingOffsets contains specific offsets, you must specify all TopicPartitions.
Use -1 for latest, -2 for earliest, if you don't care.
Specified: Set(test1-0) Assigned: Set(TEST1-0)
解决了问题。它是 Spark 库升级到更高版本的一个错误,解决了这个问题。
我有一个用例,其中多个表发布到相同的主题但不同的分区。我想单独从特定分区和自定义偏移量读取。
val data = sql.readStream.format("kafka")
.option("kafka.bootstrap.servers", "servers")
.option("assign", {"TEST1":[0]})
.option("startingOffsets",""" {"TEST1":{"0":172260244}} """)
.option("endingOffsets",""" {"TEST1":{"0":-1}} """)
.load()
所以,当我订阅它时,我收到了以下错误。主题名称正在自动转换为小写。
WARN org.apache.spark.sql.kafka010.KafkaSource - Error in attempt 1 getting Kafka offsets:
java.lang.AssertionError: assertion failed: If startingOffsets contains specific offsets, you must specify all TopicPartitions.
Use -1 for latest, -2 for earliest, if you don't care.
Specified: Set(test1-0) Assigned: Set(TEST1-0)
解决了问题。它是 Spark 库升级到更高版本的一个错误,解决了这个问题。