HDFS 中的 Spark 类路径

Question

对于 YARN（纱线客户端）上的 Spark 作业运行，是否可以使用位于 HDFS 中的 jars 指定类路径

有点像 Map Reduce 作业是可能的：

DistributedCache.addFileToClassPath(Path file, Configuration conf, FileSystem fs)

Answer 1

来自 SparkContext 文档：

def addJar(path: String): Unit

Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.

所以我认为在你的 sparkContext 初始化中添加这个就足够了：

sc.addJar("hdfs://your/path/to/whatever.jar")

如果您只想添加一个文件，可以使用相关的 addFile() 方法。

有关更多信息，请参阅 docs。

HDFS 中的 Spark 类路径

Spark classpath in HDFS

hadoop

hdfs

hadoop-yarn

apache-spark