从 Spark ETL 重置 BigQuery Table

Reset BigQuery Table from Spark ETL

我有一个问题要问你。如果我有一个内置于 Databricks 的 ETL，它将数据加载到 BigQuery 中，但我希望在 ETL 的每个运行之前擦除 BigQuery table，这可能吗？抱歉新手问题！谢谢！！！

当您加载数据时，configuration.load 属性或 jobs.insert 下有两个属性（以及许多其他属性）可用于控制 [=29] 发生的情况和方式=] 你加载到：

configuration.load.writeDisposition

[Optional] Specifies the action that occurs if the destination table already exists.

The following values are supported:
WRITE_TRUNCATE: If the table already exists, BigQuery overwrites the table data. WRITE_APPEND: If the table already exists, BigQuery appends the data to the table.
WRITE_EMPTY: If the table already exists and contains data, a 'duplicate' error is returned in the job result.
The default value is WRITE_APPEND.

Each action is atomic and only occurs if BigQuery is able to complete the job successfully. Creation, truncation and append actions occur as one atomic update upon job completion.

和

configuration.load.createDisposition

[Optional] Specifies whether the job is allowed to create new tables.

The following values are supported:
CREATE_IF_NEEDED: If the table does not exist, BigQuery creates the table.
CREATE_NEVER: The table must already exist. If it does not, a 'notFound' error is returned in the job result.
The default value is CREATE_IF_NEEDED.

Creation, truncation and append actions occur as one atomic update upon job completion.

那么，WRITE_TRUNCATE 就是您要找的

从 Spark ETL 重置 BigQuery Table

Reset BigQuery Table from Spark ETL

etl

google-bigquery

apache-spark

pyspark

spark-dataframe