编码器问题 - Spark Structured streaming - 仅适用于 repl
enocder issue- Spark Structured streaming- works in repl only
我有一个使用模式 reg 摄取和反序列化 kafka avro 消息的工作流程。它在 REPL 中运行良好,但当我尝试编译时,我得到
Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
[error] .map(x => {
我不确定是否需要修改我的对象,但如果 REPL 工作正常,我为什么需要修改。
object AgentDeserializerWrapper {
val props = new Properties()
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryURL)
props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true")
val vProps = new kafka.utils.VerifiableProperties(props)
val deser = new KafkaAvroDecoder(vProps)
val avro_schema = new RestService(schemaRegistryURL).getLatestVersion(subjectValueNameAgentRead)
val messageSchema = new Schema.Parser().parse(avro_schema.getSchema)
}
case class DeserializedFromKafkaRecord( value: String)
import spark.implicits._
val agentStringDF = spark
.readStream
.format("kafka")
.option("subscribe", "agent")
.options(kafkaParams)
.load()
.map(x => {
DeserializedFromKafkaRecord(AgentDeserializerWrapper.deser.fromBytes(x.getAs[Array[Byte]]("value"), AgentDeserializerWrapper.messageSchema).asInstanceOf[GenericData.Record].toString)
})
添加为[DeserializedFromKafkaRecord],以便静态键入您的数据集:
val agentStringDF = spark
.readStream
.format("kafka")
.option("subscribe", "agent")
.options(kafkaParams)
.load()
.as[DeserializedFromKafkaRecord]
.map(x => {
DeserializedFromKafkaRecord(AgentDeserializerWrapper.deser.fromBytes(x.getAs[Array[Byte]]("value"), AgentDeserializerWrapper.messageSchema).asInstanceOf[GenericData.Record].toString)
})
我有一个使用模式 reg 摄取和反序列化 kafka avro 消息的工作流程。它在 REPL 中运行良好,但当我尝试编译时,我得到
Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
[error] .map(x => {
我不确定是否需要修改我的对象,但如果 REPL 工作正常,我为什么需要修改。
object AgentDeserializerWrapper {
val props = new Properties()
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryURL)
props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true")
val vProps = new kafka.utils.VerifiableProperties(props)
val deser = new KafkaAvroDecoder(vProps)
val avro_schema = new RestService(schemaRegistryURL).getLatestVersion(subjectValueNameAgentRead)
val messageSchema = new Schema.Parser().parse(avro_schema.getSchema)
}
case class DeserializedFromKafkaRecord( value: String)
import spark.implicits._
val agentStringDF = spark
.readStream
.format("kafka")
.option("subscribe", "agent")
.options(kafkaParams)
.load()
.map(x => {
DeserializedFromKafkaRecord(AgentDeserializerWrapper.deser.fromBytes(x.getAs[Array[Byte]]("value"), AgentDeserializerWrapper.messageSchema).asInstanceOf[GenericData.Record].toString)
})
添加为[DeserializedFromKafkaRecord],以便静态键入您的数据集:
val agentStringDF = spark
.readStream
.format("kafka")
.option("subscribe", "agent")
.options(kafkaParams)
.load()
.as[DeserializedFromKafkaRecord]
.map(x => {
DeserializedFromKafkaRecord(AgentDeserializerWrapper.deser.fromBytes(x.getAs[Array[Byte]]("value"), AgentDeserializerWrapper.messageSchema).asInstanceOf[GenericData.Record].toString)
})