为什么数组的序列表达式在 F# 中这么慢?
Why sequence expressions for arrays are so slow in F#?
代码:
#time "on"
let newVector = [|
for v in 1..10000000 ->
v |]
let newVector2 =
let a = Array.zeroCreate 10000000
for v in 1..10000000 do
a.[v-1] <- v
a
let newVector3 =
let a = System.Collections.Generic.List() // do not set capacity
for v in 1..10000000 do
a.Add(v)
a.ToArray()
在 FSI 中给出时间:
--> Timing now on
>
Real: 00:00:01.121, CPU: 00:00:01.156, GC gen0: 4, gen1: 4, gen2: 4
val newVector : int [] = [|1; 2; 3; 4; ...|]
>
Real: 00:00:00.024, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
val newVector2 : int32 [] = [|1; 2; 3; 4; ...|]
>
Real: 00:00:00.173, CPU: 00:00:00.156, GC gen0: 2, gen1: 2, gen2: 2
val newVector3 : int32 [] = [|1; 2; 3; 4; ...|]
独立应用程序没有太大区别,发布模式,没有调试,平均运行 5 次:
- 序列表达式:425
- 预分配:13
- 名单:80
第三种方法不知道原始容量,但仍然快了将近 7 倍。为什么数组的序列表达式在 F# 中这么慢?
更新:
let seq = seq { for v in 1..10000000 do yield v }
let seqArr = seq |> Seq.toArray
Real: 00:00:01.060, CPU: 00:00:01.078, GC gen0: 2, gen1: 2, gen2: 2
let newVector4 =
let a = System.Collections.Generic.List() // do not set capacity
for v in seq do
a.Add(v)
a.ToArray()
Real: 00:00:01.119, CPU: 00:00:01.109, GC gen0: 1, gen1: 1, gen2: 1
open System.Linq
let newVector5 = seq.ToArray()
Real: 00:00:00.969, CPU: 00:00:00.968, GC gen0: 0, gen1: 0, gen2: 0
这给出了与第一个相同的时间并且不依赖于 GC。所以 真正的问题 是,为什么枚举 1..10000000
慢得多以至于 for
在第二种和第三种情况下循环?
更新 2
open System
open System.Linq
open System.Collections.Generic
let newVector5 = seq.ToArray()
let ie count =
{ new IEnumerable<int> with
member x.GetEnumerator(): Collections.IEnumerator = x.GetEnumerator() :> Collections.IEnumerator
member x.GetEnumerator(): IEnumerator<int> =
let c = ref 0
{ new IEnumerator<int> with
member y.MoveNext() =
if !c < count then
c := !c + 1
true
else false
member y.Current with get() = !c + 1
member y.Current with get() = !c + 1 :> obj
member y.Dispose() = ()
member y.Reset() = ()
}
}
let newVector6 =
let a = System.Collections.Generic.List() // do not set capacity
for v in ie 10000000 do
a.Add(v)
a.ToArray()
Real: 00:00:00.185, CPU: 00:00:00.187, GC gen0: 1, gen1: 1, gen2: 1
IEnumerable 的手动实现等同于for
循环。我想知道为什么 lo..hi
的扩展对于一般情况应该慢得多。至少对于最常见的类型,它可以通过方法重载来实现。
笑着说,我将您的数组理解转储到 ILDASM 中以查看会发生什么。我将 let
放入 main
并得到了这个:
.locals init ([0] int32[] newVector,
[1] class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string[],class [FSharp.Core]Microsoft.FSharp.Core.Unit> V_1,
[2] string[] V_2)
IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: ldnull
IL_0003: ldc.i4.0
IL_0004: ldc.i4.0
IL_0005: newobj instance void Program/newVector@11::.ctor(int32,
class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>,
int32,
int32)
IL_000a: call !!0[] [FSharp.Core]Microsoft.FSharp.Collections.SeqModule::ToArray<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_000f: stloc.0
因此创建了一个 newVector@11 的实例,并且 class 继承自 GeneratedSequenceBase,后者本身实现了 IENumerable。这是有道理的,因为接下来要调用 Seq.ToArray。查看 that class,无法确定序列的长度是否已知,即使在这种情况下它是已知的,因为 IEnumerable 的性质。这告诉我它应该等同于:
let seqArr = seq { for v in 1..10000000 do yield v } |> Seq.toArray
时间证明了这一点:
Real: 00:00:01.032, CPU: 00:00:01.060, GC gen0: 4, gen1: 4, gen2: 4
为了最后一点乐趣,我通过 ILDASM 和包装器 class 和其中的 GenerateNext 方法对上述序列理解进行了说明,这些说明与数组理解相同。所以我认为可以 非常 安全地得出结论,任何形式的数组理解:
let arr = [| sequence-expr |]
100% 等同于:
let arr = seq { sequence-expr } |> Seq.toArray
F# array documentation 中的以下行暗示了这一点:
You can also use sequence expressions to create arrays. Following is an example that creates an array of squares of integers from 1 to 10.
这是真的 "it's a sequence, not its own thing."
在这种情况下,我总是使用 .NET 中许多优秀的反编译器之一来检查生成的代码。
let explicitArray () =
let a = Array.zeroCreate count
for v in 1..count do
a.[v-1] <- v
a
这被编译成等效的 C#:
public static int[] explicitArray()
{
int[] a = ArrayModule.ZeroCreate<int>(10000000);
for (int v = 1; v < 10000001; v++)
{
a[v - 1] = v;
}
return a;
}
这差不多是最有效的了。
let arrayExpression () =
[| for v in 1..count -> v |]
另一方面,这变成:
public static int[] arrayExpression()
{
return SeqModule.ToArray<int>(new Program.arrayExpression@7(0, null, 0, 0));
}
这相当于:
let arrayExpression () =
let e = seq { for v in 1..count -> v }
let a = List() // do not set capacity
for v in e do
a.Add(v)
a.ToArray()
当遍历 seq
(IEnumerable
的别名)时,首先调用 MoveNext
,然后调用 Current
。这些是 JIT:er 很少可以内联的虚拟调用。检查 JIT:ed 汇编代码,我们看到:
mov rax,qword ptr [rbp+10h]
cmp byte ptr [rax],0
mov rcx,qword ptr [rbp+10h]
lea r11,[7FFC07830030h]
# virtual call .MoveNext
call qword ptr [7FFC07830030h]
movzx ecx,al
# if .MoveNext returns false then exit
test ecx,ecx
je 00007FFC079408A0
mov rcx,qword ptr [rbp+10h]
lea r11,[7FFC07830038h]
# virtual call .Current
call qword ptr [7FFC07830038h]
mov edx,eax
mov rcx,rdi
# call .Add
call 00007FFC65C8B300
# loop
jmp 00007FFC07940863
如果我们将其与使用 ResizeArray
(List
)
的 JIT:ed 代码进行比较
lea edx,[rdi-1]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
mov edx,edi
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
lea edx,[rdi+1]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
lea edx,[rdi+2]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
add edi,4
# loop
cmp edi,989682h
jl 00007FFC07910384
此处 JIT:er 已展开循环 4 次,我们只有对 List.Add
.
的非虚拟调用
这解释了为什么 F# 数组表达式比您的其他两个示例慢。
为了修复这个问题,必须修复 F# 中的 optimizer 以识别表达式的形状,例如:
seq { for v in 1..count -> v } |> Seq.toArray
并将它们优化成:
let a = Array.zeroCreate count
for v in 1..count do
a.[v-1] <- v
a
面临的挑战是找到一种足够通用且有用但又不会破坏 F# 语义的优化。
代码:
#time "on"
let newVector = [|
for v in 1..10000000 ->
v |]
let newVector2 =
let a = Array.zeroCreate 10000000
for v in 1..10000000 do
a.[v-1] <- v
a
let newVector3 =
let a = System.Collections.Generic.List() // do not set capacity
for v in 1..10000000 do
a.Add(v)
a.ToArray()
在 FSI 中给出时间:
--> Timing now on
>
Real: 00:00:01.121, CPU: 00:00:01.156, GC gen0: 4, gen1: 4, gen2: 4
val newVector : int [] = [|1; 2; 3; 4; ...|]
>
Real: 00:00:00.024, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
val newVector2 : int32 [] = [|1; 2; 3; 4; ...|]
>
Real: 00:00:00.173, CPU: 00:00:00.156, GC gen0: 2, gen1: 2, gen2: 2
val newVector3 : int32 [] = [|1; 2; 3; 4; ...|]
独立应用程序没有太大区别,发布模式,没有调试,平均运行 5 次:
- 序列表达式:425
- 预分配:13
- 名单:80
第三种方法不知道原始容量,但仍然快了将近 7 倍。为什么数组的序列表达式在 F# 中这么慢?
更新:
let seq = seq { for v in 1..10000000 do yield v }
let seqArr = seq |> Seq.toArray
Real: 00:00:01.060, CPU: 00:00:01.078, GC gen0: 2, gen1: 2, gen2: 2
let newVector4 =
let a = System.Collections.Generic.List() // do not set capacity
for v in seq do
a.Add(v)
a.ToArray()
Real: 00:00:01.119, CPU: 00:00:01.109, GC gen0: 1, gen1: 1, gen2: 1
open System.Linq
let newVector5 = seq.ToArray()
Real: 00:00:00.969, CPU: 00:00:00.968, GC gen0: 0, gen1: 0, gen2: 0
这给出了与第一个相同的时间并且不依赖于 GC。所以 真正的问题 是,为什么枚举 1..10000000
慢得多以至于 for
在第二种和第三种情况下循环?
更新 2
open System
open System.Linq
open System.Collections.Generic
let newVector5 = seq.ToArray()
let ie count =
{ new IEnumerable<int> with
member x.GetEnumerator(): Collections.IEnumerator = x.GetEnumerator() :> Collections.IEnumerator
member x.GetEnumerator(): IEnumerator<int> =
let c = ref 0
{ new IEnumerator<int> with
member y.MoveNext() =
if !c < count then
c := !c + 1
true
else false
member y.Current with get() = !c + 1
member y.Current with get() = !c + 1 :> obj
member y.Dispose() = ()
member y.Reset() = ()
}
}
let newVector6 =
let a = System.Collections.Generic.List() // do not set capacity
for v in ie 10000000 do
a.Add(v)
a.ToArray()
Real: 00:00:00.185, CPU: 00:00:00.187, GC gen0: 1, gen1: 1, gen2: 1
IEnumerable 的手动实现等同于for
循环。我想知道为什么 lo..hi
的扩展对于一般情况应该慢得多。至少对于最常见的类型,它可以通过方法重载来实现。
笑着说,我将您的数组理解转储到 ILDASM 中以查看会发生什么。我将 let
放入 main
并得到了这个:
.locals init ([0] int32[] newVector,
[1] class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string[],class [FSharp.Core]Microsoft.FSharp.Core.Unit> V_1,
[2] string[] V_2)
IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: ldnull
IL_0003: ldc.i4.0
IL_0004: ldc.i4.0
IL_0005: newobj instance void Program/newVector@11::.ctor(int32,
class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>,
int32,
int32)
IL_000a: call !!0[] [FSharp.Core]Microsoft.FSharp.Collections.SeqModule::ToArray<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_000f: stloc.0
因此创建了一个 newVector@11 的实例,并且 class 继承自 GeneratedSequenceBase,后者本身实现了 IENumerable。这是有道理的,因为接下来要调用 Seq.ToArray。查看 that class,无法确定序列的长度是否已知,即使在这种情况下它是已知的,因为 IEnumerable 的性质。这告诉我它应该等同于:
let seqArr = seq { for v in 1..10000000 do yield v } |> Seq.toArray
时间证明了这一点:
Real: 00:00:01.032, CPU: 00:00:01.060, GC gen0: 4, gen1: 4, gen2: 4
为了最后一点乐趣,我通过 ILDASM 和包装器 class 和其中的 GenerateNext 方法对上述序列理解进行了说明,这些说明与数组理解相同。所以我认为可以 非常 安全地得出结论,任何形式的数组理解:
let arr = [| sequence-expr |]
100% 等同于:
let arr = seq { sequence-expr } |> Seq.toArray
F# array documentation 中的以下行暗示了这一点:
You can also use sequence expressions to create arrays. Following is an example that creates an array of squares of integers from 1 to 10.
这是真的 "it's a sequence, not its own thing."
在这种情况下,我总是使用 .NET 中许多优秀的反编译器之一来检查生成的代码。
let explicitArray () =
let a = Array.zeroCreate count
for v in 1..count do
a.[v-1] <- v
a
这被编译成等效的 C#:
public static int[] explicitArray()
{
int[] a = ArrayModule.ZeroCreate<int>(10000000);
for (int v = 1; v < 10000001; v++)
{
a[v - 1] = v;
}
return a;
}
这差不多是最有效的了。
let arrayExpression () =
[| for v in 1..count -> v |]
另一方面,这变成:
public static int[] arrayExpression()
{
return SeqModule.ToArray<int>(new Program.arrayExpression@7(0, null, 0, 0));
}
这相当于:
let arrayExpression () =
let e = seq { for v in 1..count -> v }
let a = List() // do not set capacity
for v in e do
a.Add(v)
a.ToArray()
当遍历 seq
(IEnumerable
的别名)时,首先调用 MoveNext
,然后调用 Current
。这些是 JIT:er 很少可以内联的虚拟调用。检查 JIT:ed 汇编代码,我们看到:
mov rax,qword ptr [rbp+10h]
cmp byte ptr [rax],0
mov rcx,qword ptr [rbp+10h]
lea r11,[7FFC07830030h]
# virtual call .MoveNext
call qword ptr [7FFC07830030h]
movzx ecx,al
# if .MoveNext returns false then exit
test ecx,ecx
je 00007FFC079408A0
mov rcx,qword ptr [rbp+10h]
lea r11,[7FFC07830038h]
# virtual call .Current
call qword ptr [7FFC07830038h]
mov edx,eax
mov rcx,rdi
# call .Add
call 00007FFC65C8B300
# loop
jmp 00007FFC07940863
如果我们将其与使用 ResizeArray
(List
)
lea edx,[rdi-1]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
mov edx,edi
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
lea edx,[rdi+1]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
lea edx,[rdi+2]
mov rcx,rbx
# call .Add
call 00007FFC65C8B300
add edi,4
# loop
cmp edi,989682h
jl 00007FFC07910384
此处 JIT:er 已展开循环 4 次,我们只有对 List.Add
.
这解释了为什么 F# 数组表达式比您的其他两个示例慢。
为了修复这个问题,必须修复 F# 中的 optimizer 以识别表达式的形状,例如:
seq { for v in 1..count -> v } |> Seq.toArray
并将它们优化成:
let a = Array.zeroCreate count
for v in 1..count do
a.[v-1] <- v
a
面临的挑战是找到一种足够通用且有用但又不会破坏 F# 语义的优化。