Google dataproc 上的 sparkR 中的云存储连接器
Google cloud storage connector within sparkR on dataproc
我看到 gs:// 接口在 dataproc 集群上的 spark 和 pyspark 中可用,但在 SparkR 中不起作用 shell。有没有办法让它工作?如果您 运行 根本找不到该路径。我知道 cloudyR 项目。
gs:// 接口应该在 Dataproc 上的 sparkR shell 中工作,如果您在 DataFrame read interface 中使用它,例如
> df <- read.parquet("gs://public-datasets/natality/parquet/")
> printSchema(df)
root
|-- source_year: integer (nullable = true)
|-- year: integer (nullable = true)
|-- month: integer (nullable = true)
|-- day: string (nullable = true)
|-- wday: integer (nullable = true)
|-- state: string (nullable = true)
|-- is_male: string (nullable = true)
|-- child_race: integer (nullable = true)
|-- weight_pounds: double (nullable = true)
|-- plurality: integer (nullable = true)
|-- apgar_1min: integer (nullable = true)
|-- apgar_5min: integer (nullable = true)
|-- mother_residence_state: string (nullable = true)
|-- mother_race: integer (nullable = true)
|-- mother_age: integer (nullable = true)
|-- gestation_weeks: integer (nullable = true)
|-- lmp: string (nullable = true)
|-- mother_married: string (nullable = true)
|-- mother_birth_state: string (nullable = true)
|-- cigarette_use: string (nullable = true)
|-- cigarettes_per_day: integer (nullable = true)
|-- alcohol_use: string (nullable = true)
|-- drinks_per_week: integer (nullable = true)
|-- weight_gain_pounds: integer (nullable = true)
|-- born_alive_alive: integer (nullable = true)
|-- born_alive_dead: integer (nullable = true)
|-- born_dead: integer (nullable = true)
|-- ever_born: integer (nullable = true)
|-- father_race: integer (nullable = true)
|-- father_age: integer (nullable = true)
|-- record_weight: integer (nullable = true)
我看到 gs:// 接口在 dataproc 集群上的 spark 和 pyspark 中可用,但在 SparkR 中不起作用 shell。有没有办法让它工作?如果您 运行 根本找不到该路径。我知道 cloudyR 项目。
gs:// 接口应该在 Dataproc 上的 sparkR shell 中工作,如果您在 DataFrame read interface 中使用它,例如
> df <- read.parquet("gs://public-datasets/natality/parquet/")
> printSchema(df)
root
|-- source_year: integer (nullable = true)
|-- year: integer (nullable = true)
|-- month: integer (nullable = true)
|-- day: string (nullable = true)
|-- wday: integer (nullable = true)
|-- state: string (nullable = true)
|-- is_male: string (nullable = true)
|-- child_race: integer (nullable = true)
|-- weight_pounds: double (nullable = true)
|-- plurality: integer (nullable = true)
|-- apgar_1min: integer (nullable = true)
|-- apgar_5min: integer (nullable = true)
|-- mother_residence_state: string (nullable = true)
|-- mother_race: integer (nullable = true)
|-- mother_age: integer (nullable = true)
|-- gestation_weeks: integer (nullable = true)
|-- lmp: string (nullable = true)
|-- mother_married: string (nullable = true)
|-- mother_birth_state: string (nullable = true)
|-- cigarette_use: string (nullable = true)
|-- cigarettes_per_day: integer (nullable = true)
|-- alcohol_use: string (nullable = true)
|-- drinks_per_week: integer (nullable = true)
|-- weight_gain_pounds: integer (nullable = true)
|-- born_alive_alive: integer (nullable = true)
|-- born_alive_dead: integer (nullable = true)
|-- born_dead: integer (nullable = true)
|-- ever_born: integer (nullable = true)
|-- father_race: integer (nullable = true)
|-- father_age: integer (nullable = true)
|-- record_weight: integer (nullable = true)