为什么数组的序列表达式在 F# 中这么慢?

Why sequence expressions for arrays are so slow in F#?

代码:

#time "on"

let newVector = [| 
  for v in 1..10000000 ->
    v |] 

let newVector2 = 
  let a = Array.zeroCreate 10000000
  for v in 1..10000000 do
    a.[v-1] <- v
  a

let newVector3 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in 1..10000000 do
    a.Add(v)
  a.ToArray()

在 FSI 中给出时间:

--> Timing now on

> 
Real: 00:00:01.121, CPU: 00:00:01.156, GC gen0: 4, gen1: 4, gen2: 4

val newVector : int [] =  [|1; 2; 3; 4; ...|]

> 
Real: 00:00:00.024, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0

val newVector2 : int32 [] =  [|1; 2; 3; 4; ...|]

> 
Real: 00:00:00.173, CPU: 00:00:00.156, GC gen0: 2, gen1: 2, gen2: 2

val newVector3 : int32 [] =  [|1; 2; 3; 4;  ...|]

独立应用程序没有太大区别,发布模式,没有调试,平均运行 5 次:

第三种方法不知道原始容量,但仍然快了将近 7 倍。为什么数组的序列表达式在 F# 中这么慢?

更新:

let seq = seq { for v in 1..10000000 do yield v }
let seqArr = seq |> Seq.toArray
Real: 00:00:01.060, CPU: 00:00:01.078, GC gen0: 2, gen1: 2, gen2: 2

let newVector4 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in seq do
    a.Add(v)
  a.ToArray()
Real: 00:00:01.119, CPU: 00:00:01.109, GC gen0: 1, gen1: 1, gen2: 1

open System.Linq
let newVector5 =  seq.ToArray()
Real: 00:00:00.969, CPU: 00:00:00.968, GC gen0: 0, gen1: 0, gen2: 0

这给出了与第一个相同的时间并且不依赖于 GC。所以 真正的问题 是,为什么枚举 1..10000000 慢得多以至于 for 在第二种和第三种情况下循环?

更新 2

open System
open System.Linq
open System.Collections.Generic
let newVector5 =  seq.ToArray()

let ie count = 
  { new IEnumerable<int> with
      member x.GetEnumerator(): Collections.IEnumerator = x.GetEnumerator() :> Collections.IEnumerator

      member x.GetEnumerator(): IEnumerator<int> = 
        let c = ref 0
        { new IEnumerator<int> with
            member y.MoveNext() = 
              if !c < count then
                c := !c + 1
                true
              else false

            member y.Current with get() = !c + 1
            member y.Current with get() = !c + 1 :> obj
            member y.Dispose() = () 
            member y.Reset() = ()       
        }
  }


let newVector6 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in ie 10000000 do
    a.Add(v)
  a.ToArray()
Real: 00:00:00.185, CPU: 00:00:00.187, GC gen0: 1, gen1: 1, gen2: 1

IEnumerable 的手动实现等同于for 循环。我想知道为什么 lo..hi 的扩展对于一般情况应该慢得多。至少对于最常见的类型,它可以通过方法重载来实现。

笑着说,我将您的数组理解转储到 ILDASM 中以查看会发生什么。我将 let 放入 main 并得到了这个:

.locals init ([0] int32[] newVector,
       [1] class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string[],class [FSharp.Core]Microsoft.FSharp.Core.Unit> V_1,
       [2] string[] V_2)
IL_0000:  nop
IL_0001:  ldc.i4.0
IL_0002:  ldnull
IL_0003:  ldc.i4.0
IL_0004:  ldc.i4.0
IL_0005:  newobj     instance void Program/newVector@11::.ctor(int32,
                                                             class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>,
                                                             int32,
                                                             int32)
IL_000a:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.SeqModule::ToArray<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_000f:  stloc.0

因此创建了一个 newVector@11 的实例,并且 class 继承自 GeneratedSequenceBase,后者本身实现了 IENumerable。这是有道理的,因为接下来要调用 Seq.ToArray。查看 that class,无法确定序列的长度是否已知,即使在这种情况下它是已知的,因为 IEnumerable 的性质。这告诉我它应该等同于:

let seqArr = seq { for v in 1..10000000 do yield v } |> Seq.toArray

时间证明了这一点:

Real: 00:00:01.032, CPU: 00:00:01.060, GC gen0: 4, gen1: 4, gen2: 4

为了最后一点乐趣,我通过 ILDASM 和包装器 class 和其中的 GenerateNext 方法对上述序列理解进行了说明,这些说明与数组理解相同。所以我认为可以 非常 安全地得出结论,任何形式的数组理解:

let arr = [| sequence-expr |]

100% 等同于:

let arr = seq { sequence-expr } |> Seq.toArray

F# array documentation 中的以下行暗示了这一点:

You can also use sequence expressions to create arrays. Following is an example that creates an array of squares of integers from 1 to 10.

这是真的 "it's a sequence, not its own thing."

在这种情况下,我总是使用 .NET 中许多优秀的反编译器之一来检查生成的代码。

let explicitArray () = 
  let a = Array.zeroCreate count
  for v in 1..count do
    a.[v-1] <- v
  a

这被编译成等效的 C#:

public static int[] explicitArray()
{
    int[] a = ArrayModule.ZeroCreate<int>(10000000);
    for (int v = 1; v < 10000001; v++)
    {
        a[v - 1] = v;
    }
    return a;
}

这差不多是最有效的了。

let arrayExpression () = 
  [|  for v in 1..count -> v |] 

另一方面,这变成:

public static int[] arrayExpression()
{
    return SeqModule.ToArray<int>(new Program.arrayExpression@7(0, null, 0, 0));
}

这相当于:

let arrayExpression () = 
  let e = seq { for v in 1..count -> v }
  let a = List() // do not set capacity
  for v in e do
    a.Add(v)
  a.ToArray()

当遍历 seqIEnumerable 的别名)时,首先调用 MoveNext,然后调用 Current。这些是 JIT:er 很少可以内联的虚拟调用。检查 JIT:ed 汇编代码,我们看到:

mov         rax,qword ptr [rbp+10h]  
cmp         byte ptr [rax],0  
mov         rcx,qword ptr [rbp+10h]  
lea         r11,[7FFC07830030h]  
# virtual call .MoveNext
call        qword ptr [7FFC07830030h]  
movzx       ecx,al  
# if .MoveNext returns false then exit
test        ecx,ecx  
je          00007FFC079408A0  
mov         rcx,qword ptr [rbp+10h]  
lea         r11,[7FFC07830038h]  
# virtual call .Current
call        qword ptr [7FFC07830038h]  
mov         edx,eax  
mov         rcx,rdi  
# call .Add
call        00007FFC65C8B300  
# loop
jmp         00007FFC07940863  

如果我们将其与使用 ResizeArray (List)

的 JIT:ed 代码进行比较
lea         edx,[rdi-1]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
mov         edx,edi  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
lea         edx,[rdi+1]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
lea         edx,[rdi+2]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
add         edi,4  
# loop
cmp         edi,989682h  
jl          00007FFC07910384  

此处 JIT:er 已展开循环 4 次,我们只有对 List.Add.

的非虚拟调用

这解释了为什么 F# 数组表达式比您的其他两个示例慢。

为了修复这个问题,必须修复 F# 中的 optimizer 以识别表达式的形状,例如:

seq { for v in 1..count -> v } |> Seq.toArray

并将它们优化成:

let a = Array.zeroCreate count
for v in 1..count do
    a.[v-1] <- v
a

面临的挑战是找到一种足够通用且有用但又不会破坏 F# 语义的优化。