spark 提交 class 未找到错误。java class 未找到
spark submit class not found error .java class not found
我已经 java 安装了 jdk8。我不知道为什么会收到此错误。
路径也正确完成。我究竟做错了什么。请帮我解决这个问题 problem.my spark 版本是 2.4.7。我正在使用 intellij ide
当我尝试 运行 代码时出现此错误。
C:\spark\spark-2.4.7-bin-hadoop2.7\bin>spark-submit --class TopViewedCategories --master local C:\Users\Piyush\IdeaProjects\BDA\target\BDA-1.0-SNAPSHOT.jar
21/05/03 16:25:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/05/03 16:25:37 WARN SparkSubmit$$anon: Failed to load TopViewedCategories.
java.lang.ClassNotFoundException: TopViewedCategories
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806)
at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/05/03 16:25:37 INFO ShutdownHookManager: Shutdown hook called
21/05/03 16:25:37 INFO ShutdownHookManager: Deleting directory
这是代码
package org.example;
import java.util.List;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
import static org.apache.spark.SparkContext.getOrCreate;
public class TopViewedCategories {
public static void main(String[] args) throws Exception {
long timeElapsed = System.currentTimeMillis();
System.out.println("Started Processing");
SparkConf conf = new SparkConf()
.setMaster("local")
.setAppName("YouTubeDM");
JavaSparkContext sc = new JavaSparkContext(conf);
//Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
sc.setLogLevel("ERROR");
JavaRDD<String> mRDD = sc.textFile("C:\Users\Piyush\Desktop\bda\INvideos.csv"); //directory where the files are
JavaPairRDD<Double,String> sortedRDD = mRDD
// .filter(line -> line.split("\t").length > 6)
.mapToPair(
line -> {
String[] lineArr = line.split("\t");
String category = lineArr[5];
Double views = Double.parseDouble(lineArr[1]);
Tuple2<Double, Integer> viewsTuple = new Tuple2<>(views, 1);
return new Tuple2<>(category,
viewsTuple);
})
.reduceByKey((x, y) -> new Tuple2<>(x._1 + y._1, x._2 + y._2)) .mapToPair(x -> new Tuple2<>(x._1, (x._2._1 / x._2._2)))
.mapToPair(Tuple2::swap)
.sortByKey(false);
// .take(10);
long count = sortedRDD.count();
List<Tuple2<Double, String>> topTenTuples = sortedRDD.take(10);
JavaPairRDD<Double, String> topTenRdd = sc.parallelizePairs(topTenTuples); String output_dir = "C:output/spark/TopViewedCategories";
//remove output directory if already there
FileSystem fs = FileSystem.get(sc.hadoopConfiguration());
fs.delete(new Path(output_dir), true); // delete dir, true for recursive
topTenRdd.saveAsTextFile(output_dir);
timeElapsed = System.currentTimeMillis() - timeElapsed;
System.out.println("Done.Time taken (in seconds): " + timeElapsed/1000f); System.out.println("Processed Records: " + count);
sc.stop();
sc.close();
}
}
请帮我解决一下
您必须设置 class 包含包的名称:
spark-submit --class org.example.TopViewedCategories ...
我已经 java 安装了 jdk8。我不知道为什么会收到此错误。 路径也正确完成。我究竟做错了什么。请帮我解决这个问题 problem.my spark 版本是 2.4.7。我正在使用 intellij ide 当我尝试 运行 代码时出现此错误。
C:\spark\spark-2.4.7-bin-hadoop2.7\bin>spark-submit --class TopViewedCategories --master local C:\Users\Piyush\IdeaProjects\BDA\target\BDA-1.0-SNAPSHOT.jar
21/05/03 16:25:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/05/03 16:25:37 WARN SparkSubmit$$anon: Failed to load TopViewedCategories.
java.lang.ClassNotFoundException: TopViewedCategories
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806)
at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/05/03 16:25:37 INFO ShutdownHookManager: Shutdown hook called
21/05/03 16:25:37 INFO ShutdownHookManager: Deleting directory
这是代码
package org.example;
import java.util.List;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
import static org.apache.spark.SparkContext.getOrCreate;
public class TopViewedCategories {
public static void main(String[] args) throws Exception {
long timeElapsed = System.currentTimeMillis();
System.out.println("Started Processing");
SparkConf conf = new SparkConf()
.setMaster("local")
.setAppName("YouTubeDM");
JavaSparkContext sc = new JavaSparkContext(conf);
//Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
sc.setLogLevel("ERROR");
JavaRDD<String> mRDD = sc.textFile("C:\Users\Piyush\Desktop\bda\INvideos.csv"); //directory where the files are
JavaPairRDD<Double,String> sortedRDD = mRDD
// .filter(line -> line.split("\t").length > 6)
.mapToPair(
line -> {
String[] lineArr = line.split("\t");
String category = lineArr[5];
Double views = Double.parseDouble(lineArr[1]);
Tuple2<Double, Integer> viewsTuple = new Tuple2<>(views, 1);
return new Tuple2<>(category,
viewsTuple);
})
.reduceByKey((x, y) -> new Tuple2<>(x._1 + y._1, x._2 + y._2)) .mapToPair(x -> new Tuple2<>(x._1, (x._2._1 / x._2._2)))
.mapToPair(Tuple2::swap)
.sortByKey(false);
// .take(10);
long count = sortedRDD.count();
List<Tuple2<Double, String>> topTenTuples = sortedRDD.take(10);
JavaPairRDD<Double, String> topTenRdd = sc.parallelizePairs(topTenTuples); String output_dir = "C:output/spark/TopViewedCategories";
//remove output directory if already there
FileSystem fs = FileSystem.get(sc.hadoopConfiguration());
fs.delete(new Path(output_dir), true); // delete dir, true for recursive
topTenRdd.saveAsTextFile(output_dir);
timeElapsed = System.currentTimeMillis() - timeElapsed;
System.out.println("Done.Time taken (in seconds): " + timeElapsed/1000f); System.out.println("Processed Records: " + count);
sc.stop();
sc.close();
}
}
请帮我解决一下
您必须设置 class 包含包的名称:
spark-submit --class org.example.TopViewedCategories ...