将数据帧写入 parquet 文件失败并出现空模式或嵌套空模式
writing dataframe to parquet files failes with empty or nested empty schemas
我对 scala 和 spark 都很陌生。我有一个非常愚蠢的问题。我有一个从 elasticsearch 创建的数据框。我正在尝试以镶木地板格式编写 s3。下面是我的代码块和我看到的错误。好心人能帮我解惑吗?
val dfSchema = dataFrame.schema.json
// log.info(dfSchema)
dataFrame
.withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
.write
.partitionBy("lastFound")
.mode("append")
.format("parquet")
.option("schema", dfSchema)
.save("/tmp/elasticsearch/")
org.apache.spark.sql.AnalysisException:
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
当您以 parquet 格式写入数据时,您不需要放置模式。
当您使用附加模式时,您假设您已经将数据存储在您精确的路径中并且您想要添加新数据。如果你想覆盖,你可以用 "overwrite" 代替 "append" 如果路径是新的你不需要放任何东西。
写s3的时候,路径一般应该是这样的"s3://bucket/the folder"
你能试试这个吗:
dataFrame
.withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
.write
.partitionBy("lastFound")
.mode("append")
.parquet("/tmp/elasticsearch/")
我对 scala 和 spark 都很陌生。我有一个非常愚蠢的问题。我有一个从 elasticsearch 创建的数据框。我正在尝试以镶木地板格式编写 s3。下面是我的代码块和我看到的错误。好心人能帮我解惑吗?
val dfSchema = dataFrame.schema.json
// log.info(dfSchema)
dataFrame
.withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
.write
.partitionBy("lastFound")
.mode("append")
.format("parquet")
.option("schema", dfSchema)
.save("/tmp/elasticsearch/")
org.apache.spark.sql.AnalysisException:
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
当您以 parquet 格式写入数据时,您不需要放置模式。
当您使用附加模式时,您假设您已经将数据存储在您精确的路径中并且您想要添加新数据。如果你想覆盖,你可以用 "overwrite" 代替 "append" 如果路径是新的你不需要放任何东西。
写s3的时候,路径一般应该是这样的"s3://bucket/the folder"
你能试试这个吗:
dataFrame
.withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
.write
.partitionBy("lastFound")
.mode("append")
.parquet("/tmp/elasticsearch/")