Spark SQL - 转义查询字符串
Spark SQL - Escape Query String
我不敢相信我会问这个但是...
如何使用 SCALA 在 SPARK SQL 中转义 SQL 查询字符串?
百无聊赖,四处寻找。我以为 apache commons 库可以做到,但运气不好:
import org.apache.commons.lang.StringEscapeUtils
var sql = StringEscapeUtils.escapeSql("'Ulmus_minor_'Toledo'");
df.filter("topic = '" + sql + "'").map(_.getValuesMap[Any](List("hits","date"))).collect().foreach(println);
returns 以下:
topic = '''Ulmus_minor_''Toledo'''
^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.parseExpression(SqlParser.scala:45)
at org.apache.spark.sql.DataFrame.filter(DataFrame.scala:651) at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:29) at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:34) at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38) at
$iwC$$iwC$$iwC$$iwC$$iwC.(:40) at
$iwC$$iwC$$iwC$$iwC.(:42) at
$iwC$$iwC$$iwC.(:44) at $iwC$$iwC.(:46)
at $iwC.(:48) at (:50) at
.(:54) at .() at
.(:7) at .() at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497) at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at
org.apache.spark.repl.SparkIMain.loadAndRunReq(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at
org.apache.spark.repl.SparkILoop.reallyInterpret(SparkILoop.scala:857)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at
org.apache.spark.repl.SparkILoop.processLine(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop(SparkILoop.scala:665)
at
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process.apply$mcZ$sp(SparkILoop.scala:997)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process.apply(SparkILoop.scala:945)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process.apply(SparkILoop.scala:945)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31) at
org.apache.spark.repl.Main.main(Main.scala) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497) at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
帮助会很大。
j
这可能令人惊讶但是:
var sql = "'Ulmus_minor_'Toledo'"
df.filter(s"""topic = "$sql"""")
工作得很好,虽然使用这个会更干净:
df.filter($"topic" <=> sql)
问题的标题是关于在 SparkSQL 中转义字符串的一般情况,因此提供适用于任何字符串的答案可能会有好处,无论它在表达式中的使用方式如何。
def sqlEscape(s: String) =
org.apache.spark.sql.catalyst.expressions.Literal(s).sql
sqlEscape("'Ulmus_minor_'Toledo' and \"om\"")
res0: String = '\'Ulmus_minor_\'Toledo\' and "om"'
我不敢相信我会问这个但是...
如何使用 SCALA 在 SPARK SQL 中转义 SQL 查询字符串?
百无聊赖,四处寻找。我以为 apache commons 库可以做到,但运气不好:
import org.apache.commons.lang.StringEscapeUtils var sql = StringEscapeUtils.escapeSql("'Ulmus_minor_'Toledo'"); df.filter("topic = '" + sql + "'").map(_.getValuesMap[Any](List("hits","date"))).collect().foreach(println);
returns 以下:
topic = '''Ulmus_minor_''Toledo''' ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.parseExpression(SqlParser.scala:45) at org.apache.spark.sql.DataFrame.filter(DataFrame.scala:651) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:29) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:34) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38) at $iwC$$iwC$$iwC$$iwC$$iwC.(:40) at $iwC$$iwC$$iwC$$iwC.(:42) at $iwC$$iwC$$iwC.(:44) at $iwC$$iwC.(:46) at $iwC.(:48) at (:50) at .(:54) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665) at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:170) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
帮助会很大。
j
这可能令人惊讶但是:
var sql = "'Ulmus_minor_'Toledo'"
df.filter(s"""topic = "$sql"""")
工作得很好,虽然使用这个会更干净:
df.filter($"topic" <=> sql)
问题的标题是关于在 SparkSQL 中转义字符串的一般情况,因此提供适用于任何字符串的答案可能会有好处,无论它在表达式中的使用方式如何。
def sqlEscape(s: String) =
org.apache.spark.sql.catalyst.expressions.Literal(s).sql
sqlEscape("'Ulmus_minor_'Toledo' and \"om\"")
res0: String = '\'Ulmus_minor_\'Toledo\' and "om"'