如何在 Scala 中手动创建带有 Set 列的数据集
How to manually create a Dataset with a Set column in Scala
我正在尝试手动创建一个类型为 Set 列的数据集:
case class Files(Record: String, ids: Set)
val files = Seq(
Files("202110260931", Set(770010, 770880)),
Files("202110260640", Set(770010, 770880)),
Files("202110260715", Set(770010, 770880))
).toDS()
files.show()
这给了我错误:
>command-1888379816641405:10: error: type Set takes type parameters
case class Files(s3path: String, ids: Set)
我做错了什么?
Set
是参数化类型,因此当您在 Files
案例 class 中声明它时,您应该定义 Set
中的类型,例如 Set[Int]
为一组整数。所以你的 Files
案例 class 定义应该是:
case class Files(Record: String, ids: Set[Int])
因此创建具有一组列的数据集的完整代码:
import org.apache.spark.sql.SparkSession
object ToDataset {
private val spark = SparkSession.builder()
.master("local[*]")
.appName("test-app")
.config("spark.ui.enabled", "false")
.config("spark.driver.host", "localhost")
.getOrCreate()
def main(args: Array[String]): Unit = {
import spark.implicits._
val files = Seq(
Files("202110260931", Set(770010, 770880)),
Files("202110260640", Set(770010, 770880)),
Files("202110260715", Set(770010, 770880))
).toDS()
files.show()
}
case class Files(Record: String, ids: Set[Int])
}
这将 return 以下数据集:
+------------+----------------+
| Record| ids|
+------------+----------------+
|202110260931|[770010, 770880]|
|202110260640|[770010, 770880]|
|202110260715|[770010, 770880]|
+------------+----------------+
我正在尝试手动创建一个类型为 Set 列的数据集:
case class Files(Record: String, ids: Set)
val files = Seq(
Files("202110260931", Set(770010, 770880)),
Files("202110260640", Set(770010, 770880)),
Files("202110260715", Set(770010, 770880))
).toDS()
files.show()
这给了我错误:
>command-1888379816641405:10: error: type Set takes type parameters
case class Files(s3path: String, ids: Set)
我做错了什么?
Set
是参数化类型,因此当您在 Files
案例 class 中声明它时,您应该定义 Set
中的类型,例如 Set[Int]
为一组整数。所以你的 Files
案例 class 定义应该是:
case class Files(Record: String, ids: Set[Int])
因此创建具有一组列的数据集的完整代码:
import org.apache.spark.sql.SparkSession
object ToDataset {
private val spark = SparkSession.builder()
.master("local[*]")
.appName("test-app")
.config("spark.ui.enabled", "false")
.config("spark.driver.host", "localhost")
.getOrCreate()
def main(args: Array[String]): Unit = {
import spark.implicits._
val files = Seq(
Files("202110260931", Set(770010, 770880)),
Files("202110260640", Set(770010, 770880)),
Files("202110260715", Set(770010, 770880))
).toDS()
files.show()
}
case class Files(Record: String, ids: Set[Int])
}
这将 return 以下数据集:
+------------+----------------+
| Record| ids|
+------------+----------------+
|202110260931|[770010, 770880]|
|202110260640|[770010, 770880]|
|202110260715|[770010, 770880]|
+------------+----------------+