运行 在 spark sql 中涉及 ROUND 函数的查询时出错
Error when running a query involving ROUND function in spark sql
我正在尝试在 pyspark 中通过将 table 的一列四舍五入到同一 table 的另一列指定的精度来获取新列,例如, 来自以下 table:
+--------+--------+
| Data|Rounding|
+--------+--------+
|3.141592| 3|
|0.577215| 1|
+--------+--------+
我应该可以得到如下结果:
+--------+--------+--------------+
| Data|Rounding|Rounded_Column|
+--------+--------+--------------+
|3.141592| 3| 3.142|
|0.577215| 1| 0.6|
+--------+--------+--------------+
特别是,我尝试了以下代码:
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.types import (
StructType, StructField, FloatType, LongType,
IntegerType
)
pdDF = pd.DataFrame(columns=["Data", "Rounding"], data=[[3.141592, 3],
[0.577215, 1]])
mySchema = StructType([ StructField("Data", FloatType(), True),
StructField("Rounding", IntegerType(), True)])
spark = (SparkSession.builder
.master("local")
.appName("column rounding")
.getOrCreate())
df = spark.createDataFrame(pdDF,schema=mySchema)
df.show()
df.createOrReplaceTempView("df_table")
df_rounded = spark.sql("SELECT Data, Rounding, ROUND(Data, Rounding) AS Rounded_Column FROM df_table")
df_rounded .show()
但我收到以下错误:
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'round(df_table.`Data`, df_table.`Rounding`)' due to data type mismatch: Only foldable Expression is allowed for scale arguments; line 1 pos 23;\n'Project [Data#0, Rounding#1, round(Data#0, Rounding#1) AS Rounded_Column#12]\n+- SubqueryAlias df_table\n +- LogicalRDD [Data#0, Rounding#1], false\n"
任何帮助将不胜感激:)
使用 spark sql 时,催化剂会在您的 运行 - Only foldable Expression is allowed for scale arguments
中抛出以下错误
即@param scale new scale to be round to, this should be a constant int at runtime
ROUND 只期望刻度的文字。您可以尝试编写自定义代码而不是 spark-sql 方式。
编辑:
使用 UDF,
val df = Seq(
(3.141592,3),
(0.577215,1)).toDF("Data","Rounding")
df.show()
df.createOrReplaceTempView("df_table")
import org.apache.spark.sql.functions._
def RoundUDF(customvalue:Double, customscale:Int):Double = BigDecimal(customvalue).setScale(customscale, BigDecimal.RoundingMode.HALF_UP).toDouble
spark.udf.register("RoundUDF", RoundUDF(_:Double,_:Int):Double)
val df_rounded = spark.sql("select Data, Rounding, RoundUDF(Data, Rounding) as Rounded_Column from df_table")
df_rounded.show()
输入:
+--------+--------+
| Data|Rounding|
+--------+--------+
|3.141592| 3|
|0.577215| 1|
+--------+--------+
输出:
+--------+--------+--------------+
| Data|Rounding|Rounded_Column|
+--------+--------+--------------+
|3.141592| 3| 3.142|
|0.577215| 1| 0.6|
+--------+--------+--------------+
我正在尝试在 pyspark 中通过将 table 的一列四舍五入到同一 table 的另一列指定的精度来获取新列,例如, 来自以下 table:
+--------+--------+
| Data|Rounding|
+--------+--------+
|3.141592| 3|
|0.577215| 1|
+--------+--------+
我应该可以得到如下结果:
+--------+--------+--------------+
| Data|Rounding|Rounded_Column|
+--------+--------+--------------+
|3.141592| 3| 3.142|
|0.577215| 1| 0.6|
+--------+--------+--------------+
特别是,我尝试了以下代码:
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.types import (
StructType, StructField, FloatType, LongType,
IntegerType
)
pdDF = pd.DataFrame(columns=["Data", "Rounding"], data=[[3.141592, 3],
[0.577215, 1]])
mySchema = StructType([ StructField("Data", FloatType(), True),
StructField("Rounding", IntegerType(), True)])
spark = (SparkSession.builder
.master("local")
.appName("column rounding")
.getOrCreate())
df = spark.createDataFrame(pdDF,schema=mySchema)
df.show()
df.createOrReplaceTempView("df_table")
df_rounded = spark.sql("SELECT Data, Rounding, ROUND(Data, Rounding) AS Rounded_Column FROM df_table")
df_rounded .show()
但我收到以下错误:
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'round(df_table.`Data`, df_table.`Rounding`)' due to data type mismatch: Only foldable Expression is allowed for scale arguments; line 1 pos 23;\n'Project [Data#0, Rounding#1, round(Data#0, Rounding#1) AS Rounded_Column#12]\n+- SubqueryAlias df_table\n +- LogicalRDD [Data#0, Rounding#1], false\n"
任何帮助将不胜感激:)
使用 spark sql 时,催化剂会在您的 运行 - Only foldable Expression is allowed for scale arguments
即@param scale new scale to be round to, this should be a constant int at runtime
ROUND 只期望刻度的文字。您可以尝试编写自定义代码而不是 spark-sql 方式。
编辑:
使用 UDF,
val df = Seq(
(3.141592,3),
(0.577215,1)).toDF("Data","Rounding")
df.show()
df.createOrReplaceTempView("df_table")
import org.apache.spark.sql.functions._
def RoundUDF(customvalue:Double, customscale:Int):Double = BigDecimal(customvalue).setScale(customscale, BigDecimal.RoundingMode.HALF_UP).toDouble
spark.udf.register("RoundUDF", RoundUDF(_:Double,_:Int):Double)
val df_rounded = spark.sql("select Data, Rounding, RoundUDF(Data, Rounding) as Rounded_Column from df_table")
df_rounded.show()
输入:
+--------+--------+
| Data|Rounding|
+--------+--------+
|3.141592| 3|
|0.577215| 1|
+--------+--------+
输出:
+--------+--------+--------------+
| Data|Rounding|Rounded_Column|
+--------+--------+--------------+
|3.141592| 3| 3.142|
|0.577215| 1| 0.6|
+--------+--------+--------------+