将 yamr 作业提交到远程集群时出现 ClassNotFoundException
ClassNotFoundException when submitting yamr job to remote cluster
我有一个伪分布式 hadoop 集群,运行 作为一个 docker 容器
docker run -d -p 50070:50070 -p 9000:9000 -p 8032:8032 -p 8088:8088 --name had00p sequenceiq/hadoop-docker:2.6.0 /etc/bootstrap.sh -d
它的配置在这里:https://github.com/sequenceiq/docker-hadoop-ubuntu/
我可以成功处理 hdfs,访问 ui,但无法从 java 提交作业,我得到了
ClassNotFoundException: Class com.github.mikhailerofeev.hadoop.Script$MyMapper not found
示例代码如下:
@Override
public Configuration getConf() {
String host = BOOT_TO_DOCKER_IP;
int nameNodeHdfsPort = 9000;
int yarnPort = 8032;
String yarnAddr = host + ":" + yarnPort;
String hdfsAddr = "hdfs://" + host + ":" + nameNodeHdfsPort + "/";
Configuration configutation = new Configuration();
configutation.set("yarn.resourcemanager.address", yarnAddr);
configutation.set("mapreduce.framework.name", "yarn");
configutation.set("fs.default.name", hdfsAddr);
return configutation;
}
private void simpleMr(String inputPath) throws IOException {
JobConf conf = new JobConf(getConf(), Script.class);
conf.setJobName("fun");
conf.setJarByClass(MyMapper.class);
conf.setMapperClass(MyMapper.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, inputPath);
String tmpMRreturn = "/user/m-erofeev/map-test.data";
Path returnPath = new Path(tmpMRreturn);
FileOutputFormat.setOutputPath(conf, returnPath);
AccessUtils.execAsRootUnsafe(() -> {
FileSystem fs = FileSystem.get(getConf());
if (fs.exists(returnPath)) {
fs.delete(returnPath, true);
}
});
AccessUtils.execAsRootUnsafe(() -> {
RunningJob runningJob = JobClient.runJob(conf);
runningJob.waitForCompletion();
});
}
这里是 AccessUtils.execAsRootUnsafe -- 围绕 UserGroupInformation,它与 hdfs 一起工作得很好。
我哪里错了?
upd:我意识到,它应该失败,因为 hadoop 使用 java 7,但我 java 8,并计划稍后检查.但我预计在这种情况下会出现另一个失败消息......
upd2 切换到 java7 没有区别。
我的错误,我是 运行 脚本,没有将它打包到 jar(来自 IDE),所以方法 getJarByClas() 没有意义。
我有一个伪分布式 hadoop 集群,运行 作为一个 docker 容器
docker run -d -p 50070:50070 -p 9000:9000 -p 8032:8032 -p 8088:8088 --name had00p sequenceiq/hadoop-docker:2.6.0 /etc/bootstrap.sh -d
它的配置在这里:https://github.com/sequenceiq/docker-hadoop-ubuntu/
我可以成功处理 hdfs,访问 ui,但无法从 java 提交作业,我得到了
ClassNotFoundException: Class com.github.mikhailerofeev.hadoop.Script$MyMapper not found
示例代码如下:
@Override
public Configuration getConf() {
String host = BOOT_TO_DOCKER_IP;
int nameNodeHdfsPort = 9000;
int yarnPort = 8032;
String yarnAddr = host + ":" + yarnPort;
String hdfsAddr = "hdfs://" + host + ":" + nameNodeHdfsPort + "/";
Configuration configutation = new Configuration();
configutation.set("yarn.resourcemanager.address", yarnAddr);
configutation.set("mapreduce.framework.name", "yarn");
configutation.set("fs.default.name", hdfsAddr);
return configutation;
}
private void simpleMr(String inputPath) throws IOException {
JobConf conf = new JobConf(getConf(), Script.class);
conf.setJobName("fun");
conf.setJarByClass(MyMapper.class);
conf.setMapperClass(MyMapper.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, inputPath);
String tmpMRreturn = "/user/m-erofeev/map-test.data";
Path returnPath = new Path(tmpMRreturn);
FileOutputFormat.setOutputPath(conf, returnPath);
AccessUtils.execAsRootUnsafe(() -> {
FileSystem fs = FileSystem.get(getConf());
if (fs.exists(returnPath)) {
fs.delete(returnPath, true);
}
});
AccessUtils.execAsRootUnsafe(() -> {
RunningJob runningJob = JobClient.runJob(conf);
runningJob.waitForCompletion();
});
}
这里是 AccessUtils.execAsRootUnsafe -- 围绕 UserGroupInformation,它与 hdfs 一起工作得很好。
我哪里错了?
upd:我意识到,它应该失败,因为 hadoop 使用 java 7,但我 java 8,并计划稍后检查.但我预计在这种情况下会出现另一个失败消息...... upd2 切换到 java7 没有区别。
我的错误,我是 运行 脚本,没有将它打包到 jar(来自 IDE),所以方法 getJarByClas() 没有意义。