对全局变量采取行动比将它们作为参数传递更快？ [朱莉娅朗]

Question

我刚刚学习完 Julia（最重要的是性能技巧！）。我意识到使用全局变量会使代码变慢。对此的对策是将尽可能多的变量传递给函数的参数。因此我做了以下测试：

x = 10.5  #these are globals
y = 10.5

function bench1()  #acts on global
  z = 0.0
  for i in 1:100
    z += x^y
  end
  return z
end

function bench2(x, y)
  z = 0.0
  for i in 1:100
    z += x^y
  end
  return z
end

function bench3(x::Float64, y::Float64) #acts on arguments
  z::Float64 = 0.0
  for i in 1:100
    z += x^y
  end
  return z
end

@time [bench1() for j in 1:100]
@time [bench2(x,y) for j in 1:100]
@time [bench3(x,y) for j in 1:100]

不得不承认这个结果出乎我的意料，和我读到的不符。结果：

0.001623 seconds (20.00 k allocations: 313.375 KB)
0.003628 seconds (2.00 k allocations: 96.371 KB)
0.002633 seconds (252 allocations: 10.469 KB)

平均结果是，第一个直接作用于全局变量的函数总是比具有所有正确声明的最后一个函数快大约 2 倍AND 不直接作用于全局变量。谁能给我解释一下为什么？

Answer 1

我猜这主要是编译时间的问题。如果我将 "main" 代码更改为

N = 10^2
println("N = $N") 

println("bench1")
@time [bench1() for j in 1:N]
@time [bench1() for j in 1:N]

println("bench2")
@time [bench2(x,y) for j in 1:N]
@time [bench2(x,y) for j in 1:N]

它给出

N = 100
bench1
  0.004219 seconds (21.46 k allocations: 376.536 KB)
  0.001792 seconds (20.30 k allocations: 322.781 KB)
bench2
  0.006218 seconds (2.29 k allocations: 105.840 KB)
  0.000914 seconds (402 allocations: 11.844 KB)

所以在第二次测量中，bench1() 比 bench2() 慢 ~2 倍。（我省略了 bench3() 因为它给出了与 bench2() 相同的结果。）如果我们将 N 增加到 10^5，编译时间与计算时间相比可以忽略不计，所以我们可以看到即使在第一次测量中，bench2() 的预期加速。

N = 100000
bench1
  1.767392 seconds (20.70 M allocations: 321.219 MB, 8.25% gc time)
  1.720564 seconds (20.70 M allocations: 321.166 MB, 6.26% gc time)
bench2
  0.923315 seconds (799.85 k allocations: 17.608 MB, 0.96% gc time)
  0.922132 seconds (797.96 k allocations: 17.517 MB, 1.08% gc time)

Answer 2

还有一个问题是以下内容仍在全局范围内：

@time [bench1() for j in 1:100]
@time [bench2(x,y) for j in 1:100]
@time [bench3(x,y) for j in 1:100]

正如您从 @time 报告的仍然大量的分配中看到的那样。

将所有这些封装在一个函数中：

function runbench(N)
    x = 3.0
    y = 4.0
    @time [bench1() for j in 1:N]
    @time [bench2(x,y) for j in 1:N]
    @time [bench3(x,y) for j in 1:N]
end

用 runbench(1) 热身，然后 runbench(10^5) 我得到

1.425985 seconds (20.00 M allocations: 305.939 MB, 9.93% gc time)
0.061171 seconds (2 allocations: 781.313 KB)
0.062037 seconds (2 allocations: 781.313 KB)

情况 2 和 3 中分配的总内存是 10^5 乘以 8 字节，符合预期。

道德是几乎忽略实际的计时，只看内存分配，这是关于类型稳定性的信息是。

编辑：bench3 是 Julia 中的 "anti-pattern"（即一种未使用的编码风格）——你永远不应该仅仅为了修复类型不稳定性而注释类型；这不是 Julia 中类型注释的用途。

对全局变量采取行动比将它们作为参数传递更快？ [朱莉娅朗]

Acting on globals faster than passing them as arguments? [julia-lang]

performance

global

parameter-passing

julia