如何在 pyspark SQL 查询中使用 unbase64 函数?
How to use unbase64 function in pyspark SQL query?
我似乎无法弄清楚为什么 unbase64 函数在我的 Spark SQL 查询中不起作用。
这是一个例子。我正在尝试通过调用 spark SQL 中的 unbase64 函数来解码 "VGhpcyBpcyBhIHRlc3Qh"。关于为什么输出没有被解码的任何想法?谢谢。
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import unbase64
sc = SparkContext("local", "Simple App")
sqlContext = SQLContext(sc)
log = [{"eventTime":"2015-12-14 15:27:00","id":"9ab0135f-b8a3-4312-9065-9f8874fd790c","fullLog":"VGhpcyBpcyBhIHRlc3Qh"}]
df = sqlContext.createDataFrame(log)
df.registerTempTable('data')
query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')
query.write.save("output", format="json")
输出是:{"test":"VGhpcyBpcyBhIHRlc3Qh"}
当我想要它是:{"test":"This is a test!"}
它似乎对我有用...
from pyspark.sql import HiveContext
from pyspark.sql import SQLContext
log = [("2015-12-14 15:27:00","9ab0135f-b8a3-4312-9065-9f8874fd790c","VGhpcyBpcyBhIHRlc3Qh")]
rdd_log = sc.parallelize(log)
df = sqlContext.createDataFrame(rdd_log, ["eventTime", "id", "fullLog"])
df.registerTempTable("data")
query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')
query = query.select(query.test.cast("string").alias('test'))
print query.collect()
>> [Row(test=u'This is a test!')]
我似乎无法弄清楚为什么 unbase64 函数在我的 Spark SQL 查询中不起作用。
这是一个例子。我正在尝试通过调用 spark SQL 中的 unbase64 函数来解码 "VGhpcyBpcyBhIHRlc3Qh"。关于为什么输出没有被解码的任何想法?谢谢。
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import unbase64
sc = SparkContext("local", "Simple App")
sqlContext = SQLContext(sc)
log = [{"eventTime":"2015-12-14 15:27:00","id":"9ab0135f-b8a3-4312-9065-9f8874fd790c","fullLog":"VGhpcyBpcyBhIHRlc3Qh"}]
df = sqlContext.createDataFrame(log)
df.registerTempTable('data')
query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')
query.write.save("output", format="json")
输出是:{"test":"VGhpcyBpcyBhIHRlc3Qh"}
当我想要它是:{"test":"This is a test!"}
它似乎对我有用...
from pyspark.sql import HiveContext
from pyspark.sql import SQLContext
log = [("2015-12-14 15:27:00","9ab0135f-b8a3-4312-9065-9f8874fd790c","VGhpcyBpcyBhIHRlc3Qh")]
rdd_log = sc.parallelize(log)
df = sqlContext.createDataFrame(rdd_log, ["eventTime", "id", "fullLog"])
df.registerTempTable("data")
query = sqlContext.sql('SELECT unbase64(fullLog) as test FROM data')
query = query.select(query.test.cast("string").alias('test'))
print query.collect()
>> [Row(test=u'This is a test!')]