RDD 另存为文本文件

RDD Save as Text file

如何使用 RDD.save 作为文本文件保存分隔格式的文本文件?..我还需要将数据框列写为 headers..我该如何实现?

对于大型 RDD,有没有比下面更简单的方法..

List<Row> data = resultFrame.toJavaRDD().collect();
    try {
      File file = new File(fileName);

      if (!file.exists()) {
        file.createNewFile();
      }

      FileWriter fw = new FileWriter(file);

      BufferedWriter bufferedWriter = new BufferedWriter(fw);
      for (Row dataRow:data)
      {
        StringBuilder row  = new StringBuilder();
          for(int i = 0; i<dataRow.size();i++)
          {
            row.append(dataRow.get(i));
            if (i != dataRow.size()-1)
            {
              row.append("~");
            }

          }
        bufferedWriter.write(row.toString());
        bufferedWriter.write("\n");
        row.setLength(0);
      }
      bufferedWriter.close();
    } catch (IOException e) {
      LOGGER.error("Error in writing to the ruf file");
    }

正如您使用 SQLContext.read (Java API), you need to use DataFrame.write (Java API) 阅读一样。

其他方式已弃用(例如 SQLContext.parquetFile、SQLContext.jsonFile)。

感谢您的回复。以下有效

public class TildaDelimiter implements Function<Row, String> {

  public String call(Row r) {
    return r.mkString("~");
  }
}

in my save as i did the following to save as a ~ delimited file

 resultFrame.toJavaRDD().map(new TildaDelimiter()).coalesce(1, true)
            .saveAsTextFile(folderName);