如何从火花中的字符串加载数据集
How to load dataset from String in spark
从 spark 的 document,我知道我可以从文件的 libsvm-formatted
数据集加载。
但是,我想 运行 在远程 spark 集群中编码,所以我将 iris 数据集硬编码到我的代码中,我想直接从这个 String 对象加载。
但是,当查看 DataFrameReader 对象时,我发现没有 API 支持从 String
.
直接加载数据集
我试过这种方式-
val irisData =
"""
|"sepal_length","sepal_width","petal_length","petal_width","label"
|5.1,3.5,1.4,0.2,Iris-setosa
|4.9,3.0,1.4,0.2,Iris-setosa
|4.7,3.2,1.3,0.2,Iris-setosa
|4.6,3.1,1.5,0.2,Iris-setosa
""".stripMargin
println(irisData)
"sepal_length","sepal_width","petal_length","petal_width","label"
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
val stringDS = spark.createDataset(irisData.split("\n"))(Encoders.STRING)
val irisDatasetDF = spark.read
.option("inferSchema", "true")
.option("header", "true")
.csv(stringDS)
irisDatasetDF.show(false)
+------------+-----------+------------+-----------+-----------+
|sepal_length|sepal_width|petal_length|petal_width|label |
+------------+-----------+------------+-----------+-----------+
|5.1 |3.5 |1.4 |0.2 |Iris-setosa|
|4.9 |3.0 |1.4 |0.2 |Iris-setosa|
|4.7 |3.2 |1.3 |0.2 |Iris-setosa|
|4.6 |3.1 |1.5 |0.2 |Iris-setosa|
+------------+-----------+------------+-----------+-----------+
从 spark 的 document,我知道我可以从文件的 libsvm-formatted
数据集加载。
但是,我想 运行 在远程 spark 集群中编码,所以我将 iris 数据集硬编码到我的代码中,我想直接从这个 String 对象加载。
但是,当查看 DataFrameReader 对象时,我发现没有 API 支持从 String
.
我试过这种方式-
val irisData =
"""
|"sepal_length","sepal_width","petal_length","petal_width","label"
|5.1,3.5,1.4,0.2,Iris-setosa
|4.9,3.0,1.4,0.2,Iris-setosa
|4.7,3.2,1.3,0.2,Iris-setosa
|4.6,3.1,1.5,0.2,Iris-setosa
""".stripMargin
println(irisData)
"sepal_length","sepal_width","petal_length","petal_width","label"
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
val stringDS = spark.createDataset(irisData.split("\n"))(Encoders.STRING)
val irisDatasetDF = spark.read
.option("inferSchema", "true")
.option("header", "true")
.csv(stringDS)
irisDatasetDF.show(false)
+------------+-----------+------------+-----------+-----------+
|sepal_length|sepal_width|petal_length|petal_width|label |
+------------+-----------+------------+-----------+-----------+
|5.1 |3.5 |1.4 |0.2 |Iris-setosa|
|4.9 |3.0 |1.4 |0.2 |Iris-setosa|
|4.7 |3.2 |1.3 |0.2 |Iris-setosa|
|4.6 |3.1 |1.5 |0.2 |Iris-setosa|
+------------+-----------+------------+-----------+-----------+