在 Azure HDI4.0 中读取 Avro

Read Avro in Azure HDI4.0

我正在尝试在 Azure HDInsight 4.0 和 Spark 2.4 中使用 Jupyter notebook 读取 Avro 文件。 我无法向

正确提供 .jar 文件

我已经尝试了 How to use Avro on HDInsight Spark/Jupyter? and in https://docs.microsoft.com/en-in/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages 中建议的方法,但我猜它们与 Spark 2.3 有关

%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}

这会产生错误消息:

pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'

似乎有效的解决方案是

%%configure -f 
{ "conf": {"spark.jars.packages": "org.apache.spark:spark-avro_2.11:2.4.0" }}