在 Azure HDI4.0 中读取 Avro
Read Avro in Azure HDI4.0
我正在尝试在 Azure HDInsight 4.0 和 Spark 2.4 中使用 Jupyter notebook 读取 Avro 文件。
我无法向
正确提供 .jar 文件
我已经尝试了 How to use Avro on HDInsight Spark/Jupyter? and in https://docs.microsoft.com/en-in/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages 中建议的方法,但我猜它们与 Spark 2.3 有关
%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}
这会产生错误消息:
pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'
似乎有效的解决方案是
%%configure -f
{ "conf": {"spark.jars.packages": "org.apache.spark:spark-avro_2.11:2.4.0" }}
我正在尝试在 Azure HDInsight 4.0 和 Spark 2.4 中使用 Jupyter notebook 读取 Avro 文件。 我无法向
正确提供 .jar 文件我已经尝试了 How to use Avro on HDInsight Spark/Jupyter? and in https://docs.microsoft.com/en-in/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages 中建议的方法,但我猜它们与 Spark 2.3 有关
%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}
这会产生错误消息:
pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'
似乎有效的解决方案是
%%configure -f
{ "conf": {"spark.jars.packages": "org.apache.spark:spark-avro_2.11:2.4.0" }}