使用 scala 将 sql 查询结果写入数据框在数据块中失败
writing results of sql query to dataframe with scala fails in databricks
只是 运行 databricks 中的这个 spark-sql 查询工作正常:
%sql
select CONCAT(`tsArr[1]`,"-", `tsArr[0]`,"-", `tsArr[2]`," ", `tsArr[3]`) as time,
cast (context._function as string) as funct,
cast (context._param as string) as param,
cast(context._value as string) as value from clickstreamDF
lateral view explode(Context) as context
这个输出:
time funct param value
11-27-2017 08:20:33 Open location 3424
11-27-2017 08:20:33 Open Company Id testinc
11-27-2017 08:20:33 Open Channel Info 1
11-27-2017 08:20:33 Open UserAgent jack
11-27-2017 08:20:33 Open Language english
但是当我想将查询结果放入这样的数据框中时
%scala
val df_header = spark.sql(s"select CONCAT(`tsArr[1]`,"-", `tsArr[0]`,"-", `tsArr[2]`," ", `tsArr[3]`) as time,
cast (context._function as string) as funct,
cast (context._param as string) as param,
cast(context._value as string) as value
from clickstreamDF lateral view explode(Context) as context")
df_header.createOrReplaceTempView("clickstreamDF")
然后就失败了。它说:
error: ')' expected but string literal found.
我猜这与“-”和“”有关。我尝试用 '' 和 `` 替换或扩展或完全离开 "" 但没有结果。
我做错了什么?
问候,
D.
为了避免用于括起整个 Spark SQL 字符串的引号(即 "
)与 SQL 语句中使用的引号之间的歧义,请使用三引号("""
) 用于封闭引号。您还需要删除包围这些 tsArr[]
的 backticks
,如以下示例所示:
import org.apache.spark.sql.functions._
import spark.implicits._
case class CT(_function: String, _param: String, _value: String)
val clickstreamDF = Seq(
(Seq("27", "11", "2017", "08:20:33"), Seq(CT("f1", "p1", "v1"), CT("f2", "p2", "v2"))),
(Seq("28", "12", "2017", "09:30:44"), Seq(CT("f3", "p3", "v3")))
).toDF("tsArr", "contexts")
clickstreamDF.createOrReplaceTempView("clickstreamTable")
val df_header = spark.sql("""
select
concat(tsArr[1], "-", tsArr[0], "-", tsArr[2], " ", tsArr[3]) as time,
cast(context._function as string) as funct,
cast(context._param as string) as param,
cast(context._value as string) as value
from
clickstreamTable lateral view explode(contexts) as context
""")
df_header.show
// +-------------------+-----+-----+-----+
// | time|funct|param|value|
// +-------------------+-----+-----+-----+
// |11-27-2017 08:20:33| f1| p1| v1|
// |11-27-2017 08:20:33| f2| p2| v2|
// |12-28-2017 09:30:44| f3| p3| v3|
// +-------------------+-----+-----+-----+
顺便说一句,您可能需要考虑使用 DataFrame API,因为您已经在 DataFrame 中拥有了数据:
val df_header = clickstreamDF.
withColumn("time",
concat($"tsArr"(1), lit("-"), $"tsArr"(0), lit("-"), $"tsArr"(2), lit(" "), $"tsArr"(3))
).
withColumn("context", explode($"contexts")).
select($"time",
$"context._function".cast("String").as("funct"),
$"context._param".cast("String").as("param"),
$"context._value".cast("String").as("value")
)
只是 运行 databricks 中的这个 spark-sql 查询工作正常:
%sql
select CONCAT(`tsArr[1]`,"-", `tsArr[0]`,"-", `tsArr[2]`," ", `tsArr[3]`) as time,
cast (context._function as string) as funct,
cast (context._param as string) as param,
cast(context._value as string) as value from clickstreamDF
lateral view explode(Context) as context
这个输出:
time funct param value
11-27-2017 08:20:33 Open location 3424
11-27-2017 08:20:33 Open Company Id testinc
11-27-2017 08:20:33 Open Channel Info 1
11-27-2017 08:20:33 Open UserAgent jack
11-27-2017 08:20:33 Open Language english
但是当我想将查询结果放入这样的数据框中时
%scala
val df_header = spark.sql(s"select CONCAT(`tsArr[1]`,"-", `tsArr[0]`,"-", `tsArr[2]`," ", `tsArr[3]`) as time,
cast (context._function as string) as funct,
cast (context._param as string) as param,
cast(context._value as string) as value
from clickstreamDF lateral view explode(Context) as context")
df_header.createOrReplaceTempView("clickstreamDF")
然后就失败了。它说:
error: ')' expected but string literal found.
我猜这与“-”和“”有关。我尝试用 '' 和 `` 替换或扩展或完全离开 "" 但没有结果。 我做错了什么?
问候,
D.
为了避免用于括起整个 Spark SQL 字符串的引号(即 "
)与 SQL 语句中使用的引号之间的歧义,请使用三引号("""
) 用于封闭引号。您还需要删除包围这些 tsArr[]
的 backticks
,如以下示例所示:
import org.apache.spark.sql.functions._
import spark.implicits._
case class CT(_function: String, _param: String, _value: String)
val clickstreamDF = Seq(
(Seq("27", "11", "2017", "08:20:33"), Seq(CT("f1", "p1", "v1"), CT("f2", "p2", "v2"))),
(Seq("28", "12", "2017", "09:30:44"), Seq(CT("f3", "p3", "v3")))
).toDF("tsArr", "contexts")
clickstreamDF.createOrReplaceTempView("clickstreamTable")
val df_header = spark.sql("""
select
concat(tsArr[1], "-", tsArr[0], "-", tsArr[2], " ", tsArr[3]) as time,
cast(context._function as string) as funct,
cast(context._param as string) as param,
cast(context._value as string) as value
from
clickstreamTable lateral view explode(contexts) as context
""")
df_header.show
// +-------------------+-----+-----+-----+
// | time|funct|param|value|
// +-------------------+-----+-----+-----+
// |11-27-2017 08:20:33| f1| p1| v1|
// |11-27-2017 08:20:33| f2| p2| v2|
// |12-28-2017 09:30:44| f3| p3| v3|
// +-------------------+-----+-----+-----+
顺便说一句,您可能需要考虑使用 DataFrame API,因为您已经在 DataFrame 中拥有了数据:
val df_header = clickstreamDF.
withColumn("time",
concat($"tsArr"(1), lit("-"), $"tsArr"(0), lit("-"), $"tsArr"(2), lit(" "), $"tsArr"(3))
).
withColumn("context", explode($"contexts")).
select($"time",
$"context._function".cast("String").as("funct"),
$"context._param".cast("String").as("param"),
$"context._value".cast("String").as("value")
)