如何使用 Spark Streaming 将 Twitter 推文写入 HDFS Java API
How to write twitter tweets to HDFS using Spark Streaming Java API
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitterHelloWorldExample");
JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(60000));
System.setProperty("twitter4j.oauth.consumerKey", consumerKey);
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret);
System.setProperty("twitter4j.oauth.accessToken", accessToken);
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret);
String[] filters = new String[] {"Narendra Modi"};
JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc,filters);
// Without filter: Output text of all tweets
JavaDStream<String> statuses = twitterStream.map(
new Function<Status, String>() {
public String call(Status status) { return status.getText(); }
}
);
statuses.print();
statuses.saveAsHadoopFiles("hdfs://HadoopSystem-150s:8020/Spark_Twitter_out","txt");
我能够获取 Twitter 推文,但在写入 HDFS 时出现错误。
谁能帮助我使用 Java
将推文保存到 HDFS
这是我遇到的错误:
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
(default-compile) on project SparkTwitterHelloWorldExample:
Compilation failure [ERROR]
/home/Hadoop/Mani/SparkTwitterHelloWorldExample-master/src/main/java/de/michaelgoettsche/SparkTwitterHelloWorldExample.java:[58,17]
cannot find symbol [ERROR] symbol : method
saveAsHadoopFiles(java.lang.String,java.lang.String) [ERROR] location:
class
org.apache.spark.streaming.api.java.JavaDStream
您需要使用 saveAsTextFile()
方法。 Hadoop 输出格式仅适用于 JavaPairDStream
(它需要 key 和 value)。
解决方法是:
statuses.dstream().saveAsTextFiles(prefix, suffix);
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitterHelloWorldExample");
JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(60000));
System.setProperty("twitter4j.oauth.consumerKey", consumerKey);
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret);
System.setProperty("twitter4j.oauth.accessToken", accessToken);
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret);
String[] filters = new String[] {"Narendra Modi"};
JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc,filters);
// Without filter: Output text of all tweets
JavaDStream<String> statuses = twitterStream.map(
new Function<Status, String>() {
public String call(Status status) { return status.getText(); }
}
);
statuses.print();
statuses.saveAsHadoopFiles("hdfs://HadoopSystem-150s:8020/Spark_Twitter_out","txt");
我能够获取 Twitter 推文,但在写入 HDFS 时出现错误。
谁能帮助我使用 Java
将推文保存到 HDFS这是我遇到的错误:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project SparkTwitterHelloWorldExample: Compilation failure [ERROR] /home/Hadoop/Mani/SparkTwitterHelloWorldExample-master/src/main/java/de/michaelgoettsche/SparkTwitterHelloWorldExample.java:[58,17] cannot find symbol [ERROR] symbol : method saveAsHadoopFiles(java.lang.String,java.lang.String) [ERROR] location: class org.apache.spark.streaming.api.java.JavaDStream
您需要使用 saveAsTextFile()
方法。 Hadoop 输出格式仅适用于 JavaPairDStream
(它需要 key 和 value)。
解决方法是:
statuses.dstream().saveAsTextFiles(prefix, suffix);