如何使用 AvroParquetWriter 从 scala case class 制作镶木地板文件?

How to make a parquet file from scala case class using AvroParquetWriter?

我有一个案例 class 如下所示:

case class Person(id:Int,name: String)

现在,我编写了以下方法,使用 AvroParquetWriter.[=16= 从 Seq[T] 制作镶木地板文件]

  def writeToFile[T](data: Iterable[T], schema: Schema, path: String, accessKey: String, secretKey: String): Unit = {
    val conf = new Configuration

    conf.set("fs.s3.awsAccessKeyId", accessKey)
    conf.set("fs.s3.awsSecretAccessKey", secretKey)

    val s3Path = new Path(path)
    val writer = AvroParquetWriter.builder[T](s3Path)
      .withConf(conf)
      .withSchema(schema)
      .withWriteMode(ParquetFileWriter.Mode.OVERWRITE)
      .build()
      .asInstanceOf[ParquetWriter[T]]

    data.foreach(writer.write)

    writer.close()
  }

架构是:

val schema = SchemaBuilder
    .record("Person")
      .fields()
      .requiredInt("id")
      .requiredString("name")
      .endRecord()

现在,当我使用以下代码调用 writeToFile 时,出现异常:

val personData = Seq(Person(1,"A"),Person(2,"B"))

ParquetService.writeToFile[Person](
      data = personData,
      schema = schema,
      path = s3Path,
      accessKey = accessKey,
      secretKey = secretKey

java.lang.ClassCastException:com.entities.Person cannot be cast to org.apache.avro.generic.IndexedRecord

为什么 Person 不能转换为 IndexedRecord?我需要做些什么来摆脱这个异常吗?

我有一个类似的问题,根据这个例子

https://github.com/apache/parquet-mr/blob/f84938441be49c665595c936ac631c3e5f171bf9/parquet-avro/src/test/java/org/apache/parquet/avro/TestReflectReadWrite.java#L141

您缺少一种对编写器生成器的方法调用。

val writer = AvroParquetWriter.builder[T](s3Path)
  .withConf(conf)
  .withSchema(schema)
  .withDataModel(ReflectData.get) //This one
  .withWriteMode(ParquetFileWriter.Mode.OVERWRITE)
  .build()

此外,如果您希望在数据中支持空值,您可以使用 ReflectData.AllowNull.get()