为什么 Spark Streaming 在之前的主题记录上是 运行?

Why spark streaming is running on previous topics records?

我 运行 动物园管理员和卡夫卡经纪人但我没有 运行 卡夫卡制作人。 我 运行 激发流代码并在此处打印未过滤的流。 我的问题是,为什么我会收到这些数据流,即

{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}

{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}

虽然我不是运行制作人? 下面这些消息是什么意思?

19/06/24 20:20:00 INFO JobScheduler: Finished job streaming job 1561378800000 ms.0 from job set of time 1561378800000 ms
19/06/24 20:20:00 INFO JobScheduler: Total delay: 0.028 s for time 1561378800000 ms (execution: 0.021 s)
19/06/24 20:20:00 INFO MapPartitionsRDD: Removing RDD 161 from persistence list
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1716
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1893
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1944
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
...

19/06/24 20:20:00 INFO KafkaRDD: Removing RDD 160 from persistence list
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1628
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1781
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1570
19/06/24 20:20:00 INFO BlockManager: Removing RDD 161
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1808
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 2020
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1624
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1918
19/06/24 20:20:00 INFO ContextCleaner: Cleaned

您可能需要检查您在 'auto.offset.reset' 中设置的内容。

来自 Spark 流媒体指南:

val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "localhost:9092,anotherhost:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "use_a_separate_group_id_for_each_stream",
  "auto.offset.reset" -> "latest",
  "enable.auto.commit" -> (false: java.lang.Boolean)
)

他们将偏移重置设置为 "latest"。你的似乎设置为最早。