java.lang.NumberFormatException:对于输入字符串:"inf" 使用 Spark 从雪花读取时
java.lang.NumberFormatException: For input string: "inf" when reading from snowflake with Spark
我有一个 Snowflake table,其中有一列带有双打。其中一个值是 inf
和 -inf
.
当我尝试在 Spark 中读取此 table 时,作业失败并出现以下错误:
java.lang.NumberFormatException: For input string: "inf"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:285)
at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
at net.snowflake.spark.snowflake.Conversions$$anonfun.apply(Conversions.scala:156)
at net.snowflake.spark.snowflake.Conversions$$anonfun.apply(Conversions.scala:144)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at net.snowflake.spark.snowflake.Conversions$.net$snowflake$spark$snowflake$Conversions$$convertRow(Conversions.scala:144)
at net.snowflake.spark.snowflake.Conversions$$anonfun$createRowConverter.apply(Conversions.scala:132)
at net.snowflake.spark.snowflake.Conversions$$anonfun$createRowConverter.apply(Conversions.scala:132)
at net.snowflake.spark.snowflake.CSVConverter$$anonfun$convert.apply(CSVConverter.scala:73)
at net.snowflake.spark.snowflake.CSVConverter$$anonfun$convert.apply(CSVConverter.scala:34)
at scala.collection.Iterator$$anon.next(Iterator.scala:410)
at scala.collection.Iterator$$anon.next(Iterator.scala:410)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$$anon.next(InMemoryRelation.scala:100)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$$anon.next(InMemoryRelation.scala:90)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:298)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
看哪里出错了,好像是在Conversions.scala
和data.toDouble
的行转换
at net.snowflake.spark.snowflake.Conversions$$anonfun.apply(Conversions.scala:156)
如果输入 inf
,data.toDouble
将不起作用。在 scala 中,该值应该是 Infinity。 (来自 Double.PositiveInfinity.toString
)
在类似情况下避免崩溃的解决方法应该是什么?
从 spark 连接器的 v 2.6.0 开始,这已得到修复,here is the PR。
我有一个 Snowflake table,其中有一列带有双打。其中一个值是 inf
和 -inf
.
当我尝试在 Spark 中读取此 table 时,作业失败并出现以下错误:
java.lang.NumberFormatException: For input string: "inf"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:285)
at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
at net.snowflake.spark.snowflake.Conversions$$anonfun.apply(Conversions.scala:156)
at net.snowflake.spark.snowflake.Conversions$$anonfun.apply(Conversions.scala:144)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at net.snowflake.spark.snowflake.Conversions$.net$snowflake$spark$snowflake$Conversions$$convertRow(Conversions.scala:144)
at net.snowflake.spark.snowflake.Conversions$$anonfun$createRowConverter.apply(Conversions.scala:132)
at net.snowflake.spark.snowflake.Conversions$$anonfun$createRowConverter.apply(Conversions.scala:132)
at net.snowflake.spark.snowflake.CSVConverter$$anonfun$convert.apply(CSVConverter.scala:73)
at net.snowflake.spark.snowflake.CSVConverter$$anonfun$convert.apply(CSVConverter.scala:34)
at scala.collection.Iterator$$anon.next(Iterator.scala:410)
at scala.collection.Iterator$$anon.next(Iterator.scala:410)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$$anon.next(InMemoryRelation.scala:100)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$$anon.next(InMemoryRelation.scala:90)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:298)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
看哪里出错了,好像是在Conversions.scala
和data.toDouble
at net.snowflake.spark.snowflake.Conversions$$anonfun.apply(Conversions.scala:156)
如果输入 inf
,data.toDouble
将不起作用。在 scala 中,该值应该是 Infinity。 (来自 Double.PositiveInfinity.toString
)
在类似情况下避免崩溃的解决方法应该是什么?
从 spark 连接器的 v 2.6.0 开始,这已得到修复,here is the PR。