Scala

Question

这是我陷入的一种奇怪行为，我找不到任何关于为什么会这样的暗示。我在这个例子中使用了 estimate method of SizeEstimator from Spark 但我没有在他们的代码中发现任何故障所以我想知道为什么 - 如果他们提供了一个很好的内存估计 - 为什么我有这个：

val buf1 = new ArrayBuffer[(Int,Double)]
var i = 0
while (i < 3) {
   buf1 += ((i,i.toDouble))
   i += 1
}
System.out.println(s"Raw size with doubles: ${SizeEstimator.estimate(buf1)}")
val ite1 = buf1.toIterator
var size1: Long = 0l
while (ite1.hasNext) {
   val cur = ite1.next()
   size1 += SizeEstimator.estimate(cur)
}
System.out.println(s"Size with doubles: $size1")

val buf2 = new ArrayBuffer[(Int,Float)]
i = 0
while (i < 3) {
   buf2 += ((i,i.toFloat))
   i += 1
}
System.out.println(s"Raw size with floats: ${SizeEstimator.estimate(buf2)}")
val ite2 = buf2.toIterator
var size2: Long = 0l
while (ite2.hasNext) {
   val cur = ite2.next()
   size2 += SizeEstimator.estimate(cur)
 }
 System.out.println(s"Size with floats: $size2")

控制台输出打印：

Raw size with doubles: 200
Size with doubles: 96
Raw size with floats: 272
Size with floats: 168

所以我的问题很天真：在这种情况下，为什么浮点数往往比双精度数占用更多的内存？为什么当我将它转换为迭代器时它会变得更糟（第一种情况，有 75% 的比率在转换为迭代器时变成 50% 的比率！）。

（为了获得更多上下文，我在尝试通过将 Double 更改为 Float 来 "optimize" 一个 Spark 应用程序时遇到了这个问题，发现它实际上占用了更多内存比双打...)

P.S.: 这不是因为缓冲区较小（这里是 3），如果我输入 100，我会得到：

Raw size with doubles: 3752
Size with doubles: 3200
Raw size with floats: 6152
Size with floats: 5600

而且浮点数仍然消耗更多的内存...但是比率已经稳定下来，所以我猜似乎转换为迭代器的不同比率一定是由于一些开销造成的。

编辑： 似乎 Product2 实际上只专注于 Int、Long 和 Double：

trait Product2[@specialized(Int, Long, Double) +T1, @specialized(Int, Long, Double) +T2] extends Any with Product

有谁知道为什么 Float 没有被考虑在内？ Short 都不会导致奇怪的行为...

Answer 1

这是因为 Tuple2 是 @specialized 用于 Double 但不是专门用于 Float。

这意味着 (Int,Double) 将显示为具有 2 个原始 java 类型 int 和 double 字段的结构，而 (Int,Float) 将显示为具有 int 和包装类型 java.lang.Float 字段的结构

更多讨论here

Scala - 为什么在这种情况下 Double 消耗的内存比 Floats 少？

Scala - why Double consume less memory than Floats in this case?

memory

scala-collections

apache-spark