Dataproc 上的 SparkR (Spark 1.5.x) 不工作

SparkR on Dataproc (Spark 1.5.x) does not work

当我尝试在 Cloud Dataproc 集群(版本 0.2)上使用 SparkR 时,出现如下错误:

Exception in thread "main" java.io.FileNotFoundException:
/usr/lib/spark/R/lib/sparkr.zip (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.deploy.RPackageUtils$.zipRLibraries(RPackageUtils.scala:215)
at
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:371)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我该如何解决这个问题才能使用 SparkR?

此问题是由于 Spark 1.5 系列 (JIRA here). To fix this, run the following command on the master node either by SSHing into the master node or by using an initialization action.

sudo chmod 777 /usr/lib/spark/R/lib

此问题应该已在 Spark 1.6 中修复,Cloud Dataproc 最终将在未来的新映像版本中提供支持。