如何使用 Spark 加载 JSON（保存在 csv 中的路径）？

Question

我是 Spark 的新手。我可以在 Spark 中加载 .json 文件。如果一个文件夹中有数千个 .json 文件怎么办？ picture of .json files in the folder

我有一个 csv 文件，它用标签对 .json 文件进行了分类。picture of csv file

Spark要加载和保存数据怎么办（对于example.I想加载csv中的第一个信息，但是是文本信息。但是给出了. json，我想加载.json，然后保存输出。所以我会知道第一个 Trusted 标签图的 json 信息。）

Answer 1

对于JSON：

jsonRDD = sql_context.read.json("path/to/json_folder/");

对于 CSV 从这里 Databricks' spark-csv

安装 spark-csv

csvRDD = sql_context.read.load("path/to/csv_folder/",format='com.databricks.spark.csv',header='true',inferSchema='true')

How to load JSON(path saved in csv) with Spark?