spark dataframe pivoting is throwing AssertionError: assertion failed: unsafe symbol Unstable
spark dataframe pivoting is throwing AssertionError: assertion failed: unsafe symbol Unstable
我有一个数据框,即 resultDf,如下所示
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+
|model_family_id|classification_type|classification_value|benchmark_type_code| data_date|data_item_code|data_item_value_numeric|data_item_value_string|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+
| 1| COUNTRY| AGO| MEAN|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| OBS_CNT|2018-03-31 00:00:00| CREDITSCORE| 4| b|
| 1| COUNTRY| AGO| OBS_CNT_CA|2018-03-31 00:00:00| CREDITSCORE| 4| null|
| 1| COUNTRY| AGO| PERCENTILE_0|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_10|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_100|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_25|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_50|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_75|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_90|2018-03-31 00:00:00| CREDITSCORE| 15| b|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+
我根据 "benchmark_type_code" 列旋转 table,
需要实现以下业务逻辑
如果(data_item_code) 是"SCORE" 或"PG_SCORE"
====> 选择 data_item_value_string 作为值
别的
==> 选择 data_item_value_numeric 作为值
为此我写了下面的代码
val pivot_resultDf = resultDf.groupBy("model_family_id","classification_type","classification_value" ,"benchmark_type_code","data_date")
.pivot("benchmark_type_code")
.agg( first(
when( col("data_item_code").===("SCORE"), col("data_item_value_numeric"))
.otherwise(col("data_item_value_string"))
) )
但是我在聚合函数@when condition
中遇到错误
java.lang.AssertionError: assertion failed: unsafe symbol Unstable (child of <none>) in runtime reflection universe
at scala.reflect.internal.Symbols$Symbol.<init>(Symbols.scala:205)
at scala.reflect.internal.Symbols$TypeSymbol.<init>(Symbols.scala:3030)
at scala.reflect.internal.Symbols$Symbol.newStubSymbol(Symbols.scala:521)
at scala.reflect.internal.pickling.UnPickler$Scan.readExtSymbol(UnPickler.scala:258)
at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:286)
at scala.reflect.runtime.JavaMirrors$JavaMirror.unpickleClass(JavaMirrors.scala:619)
at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete.apply$mcV$sp(SymbolLoaders.scala:28)
at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete.apply(SymbolLoaders.scala:25)
at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete.apply(SymbolLoaders.scala:25)
at scala.reflect.internal.SymbolTable.slowButSafeEnteringPhaseNotLaterThan(SymbolTable.scala:263)
at scala.reflect.runtime.SymbolLoaders$TopClassCompleter.complete(SymbolLoaders.scala:25)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1535)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:158)
at org.apache.spark.sql.functions$.typedLit(functions.scala:113)
at org.apache.spark.sql.functions$.lit(functions.scala:96)
at org.apache.spark.sql.Column.$eq$eq$eq(Column.scala:262)
我在这里做错了什么?如何解决这个问题?
我不确定您为什么会收到断言错误,但我能够成功获得结果。通常断言错误是句法错误。请检查行尾并尝试在 spark shell 上执行以查看真正的差距在哪里。
找到显示我能够获得所需结果的屏幕截图。
这是有效的
.agg( 首先(
当( col("data").isin("x","a","y","z") ,
当( col("code").isin("aa","bb") , col("numeric")).否则(col("string"))
)
.否则(col("numeric"))
)
我有一个数据框,即 resultDf,如下所示
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+
|model_family_id|classification_type|classification_value|benchmark_type_code| data_date|data_item_code|data_item_value_numeric|data_item_value_string|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+
| 1| COUNTRY| AGO| MEAN|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| OBS_CNT|2018-03-31 00:00:00| CREDITSCORE| 4| b|
| 1| COUNTRY| AGO| OBS_CNT_CA|2018-03-31 00:00:00| CREDITSCORE| 4| null|
| 1| COUNTRY| AGO| PERCENTILE_0|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_10|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_100|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_25|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_50|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_75|2018-03-31 00:00:00| CREDITSCORE| 15| b|
| 1| COUNTRY| AGO| PERCENTILE_90|2018-03-31 00:00:00| CREDITSCORE| 15| b|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+
我根据 "benchmark_type_code" 列旋转 table, 需要实现以下业务逻辑
如果(data_item_code) 是"SCORE" 或"PG_SCORE" ====> 选择 data_item_value_string 作为值 别的 ==> 选择 data_item_value_numeric 作为值
为此我写了下面的代码
val pivot_resultDf = resultDf.groupBy("model_family_id","classification_type","classification_value" ,"benchmark_type_code","data_date")
.pivot("benchmark_type_code")
.agg( first(
when( col("data_item_code").===("SCORE"), col("data_item_value_numeric"))
.otherwise(col("data_item_value_string"))
) )
但是我在聚合函数@when condition
中遇到错误
java.lang.AssertionError: assertion failed: unsafe symbol Unstable (child of <none>) in runtime reflection universe
at scala.reflect.internal.Symbols$Symbol.<init>(Symbols.scala:205)
at scala.reflect.internal.Symbols$TypeSymbol.<init>(Symbols.scala:3030)
at scala.reflect.internal.Symbols$Symbol.newStubSymbol(Symbols.scala:521)
at scala.reflect.internal.pickling.UnPickler$Scan.readExtSymbol(UnPickler.scala:258)
at scala.reflect.internal.pickling.UnPickler$Scan.readSymbol(UnPickler.scala:286)
at scala.reflect.runtime.JavaMirrors$JavaMirror.unpickleClass(JavaMirrors.scala:619)
at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete.apply$mcV$sp(SymbolLoaders.scala:28)
at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete.apply(SymbolLoaders.scala:25)
at scala.reflect.runtime.SymbolLoaders$TopClassCompleter$$anonfun$complete.apply(SymbolLoaders.scala:25)
at scala.reflect.internal.SymbolTable.slowButSafeEnteringPhaseNotLaterThan(SymbolTable.scala:263)
at scala.reflect.runtime.SymbolLoaders$TopClassCompleter.complete(SymbolLoaders.scala:25)
at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1535)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:158)
at org.apache.spark.sql.functions$.typedLit(functions.scala:113)
at org.apache.spark.sql.functions$.lit(functions.scala:96)
at org.apache.spark.sql.Column.$eq$eq$eq(Column.scala:262)
我在这里做错了什么?如何解决这个问题?
我不确定您为什么会收到断言错误,但我能够成功获得结果。通常断言错误是句法错误。请检查行尾并尝试在 spark shell 上执行以查看真正的差距在哪里。 找到显示我能够获得所需结果的屏幕截图。
这是有效的
.agg( 首先( 当( col("data").isin("x","a","y","z") , 当( col("code").isin("aa","bb") , col("numeric")).否则(col("string")) ) .否则(col("numeric")) )