为什么 PSeq.map with computation expression 似乎挂起?
Why does PSeq.map with computation expression seem to hang?
我正在使用 FSharp.Collections.ParallelSeq
and a retry computation 编写抓取工具。我想从多个页面并行检索 HTML,我想在请求失败时重试。
例如:
open System
open FSharp.Collections.ParallelSeq
type RetryBuilder(max) =
member x.Return(a) = a // Enable 'return'
member x.Delay(f) = f // Gets wrapped body and returns it (as it is)
// so that the body is passed to 'Run'
member x.Zero() = failwith "Zero" // Support if .. then
member x.Run(f) = // Gets function created by 'Delay'
let rec loop(n) =
if n = 0 then failwith "Failed" // Number of retries exceeded
else try f() with _ -> loop(n-1)
loop max
let retry = RetryBuilder(4)
let getHtml (url : string) = retry {
Console.WriteLine("Get Url")
return 0;
}
//A property/field?
let GetHtmlForAllPages =
let pages = {1 .. 10}
let allHtml = pages |> PSeq.map(fun x -> getHtml("http://somesite.com/" + x.ToString())) |> Seq.toArray
allHtml
[<EntryPoint>]
let main argv =
let htmlForAllPages = GetHtmlForAllPages
0 // return an integer exit code
当我尝试从 main
与 GetHtmlForAllPages
交互时,代码似乎挂起。单步执行代码显示 PSeq.map
开始处理 pages
.
的前四个值
发生了什么导致 retry
计算表达式永远不会 start/complete? PSeq
和 retry
之间有什么奇怪的相互作用吗?
如果我创建 GetHtmlForAllPages
函数并调用它,代码将按预期工作。我很好奇当 GetHtmlForAllPages
是一个字段时发生了什么?
看起来您在静态构造函数中陷入僵局。场景描述 here:
The CLR uses an internal lock to ensure that static constructor:
- is only called once
- gets executed before creation of any instance of the
class or before accessing any static members.
With this behaviour of
CLR, there is a potential opportunity of a deadlock if we perform any
asynchronous blocking operation in a static constructor. (...)
The main thread will wait for the helper thread to complete within the
static constructor. Since the helper thread is accessing the instance
method, it will first try to acquire the internal lock. As internal
lock is already acquired by the main thread, we will end-up in a
deadlock situation.
在静态构造函数中使用并行 LINQ(或任何其他类似的库,如 FSharp.Collections.ParallelSeq)会使你 运行 陷入该问题。
不幸的是,编译器生成的 class 的静态构造函数是您获得的 GetHtmlForAllPages
值。来自 ILSpy(使用 C# 格式):
namespace <StartupCode$ConsoleApplication1>
{
internal static class $Program
{
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
internal static readonly Program.RetryBuilder retry@17;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
internal static readonly int[] GetHtmlForAllPages@24;
[DebuggerBrowsable(DebuggerBrowsableState.Never), DebuggerNonUserCode, CompilerGenerated]
internal static int init@;
static $Program()
{
$Program.retry@17 = new Program.RetryBuilder(4);
IEnumerable<int> pages = Operators.OperatorIntrinsics.RangeInt32(1, 1, 10);
ParallelQuery<int> parallelQuery = PSeqModule.map<int, int>(new Program.allHtml@26(), pages);
ParallelQuery<int> parallelQuery2 = parallelQuery;
int[] allHtml = SeqModule.ToArray<int>((IEnumerable<int>)parallelQuery2);
$Program.GetHtmlForAllPages@24 = allHtml;
}
}
}
在你的实际 Program
class:
[CompilationMapping(SourceConstructFlags.Value)]
public static int[] GetHtmlForAllPages
{
get
{
return $Program.GetHtmlForAllPages@24;
}
}
这就是死锁的来源。
一旦您将 GetHtmlForAllPages
更改为函数(通过添加 ()
),它就不再是该静态构造函数的一部分,这使得程序按预期工作。
我正在使用 FSharp.Collections.ParallelSeq
and a retry computation 编写抓取工具。我想从多个页面并行检索 HTML,我想在请求失败时重试。
例如:
open System
open FSharp.Collections.ParallelSeq
type RetryBuilder(max) =
member x.Return(a) = a // Enable 'return'
member x.Delay(f) = f // Gets wrapped body and returns it (as it is)
// so that the body is passed to 'Run'
member x.Zero() = failwith "Zero" // Support if .. then
member x.Run(f) = // Gets function created by 'Delay'
let rec loop(n) =
if n = 0 then failwith "Failed" // Number of retries exceeded
else try f() with _ -> loop(n-1)
loop max
let retry = RetryBuilder(4)
let getHtml (url : string) = retry {
Console.WriteLine("Get Url")
return 0;
}
//A property/field?
let GetHtmlForAllPages =
let pages = {1 .. 10}
let allHtml = pages |> PSeq.map(fun x -> getHtml("http://somesite.com/" + x.ToString())) |> Seq.toArray
allHtml
[<EntryPoint>]
let main argv =
let htmlForAllPages = GetHtmlForAllPages
0 // return an integer exit code
当我尝试从 main
与 GetHtmlForAllPages
交互时,代码似乎挂起。单步执行代码显示 PSeq.map
开始处理 pages
.
发生了什么导致 retry
计算表达式永远不会 start/complete? PSeq
和 retry
之间有什么奇怪的相互作用吗?
如果我创建 GetHtmlForAllPages
函数并调用它,代码将按预期工作。我很好奇当 GetHtmlForAllPages
是一个字段时发生了什么?
看起来您在静态构造函数中陷入僵局。场景描述 here:
The CLR uses an internal lock to ensure that static constructor:
- is only called once
- gets executed before creation of any instance of the class or before accessing any static members.
With this behaviour of CLR, there is a potential opportunity of a deadlock if we perform any asynchronous blocking operation in a static constructor. (...)
The main thread will wait for the helper thread to complete within the static constructor. Since the helper thread is accessing the instance method, it will first try to acquire the internal lock. As internal lock is already acquired by the main thread, we will end-up in a deadlock situation.
在静态构造函数中使用并行 LINQ(或任何其他类似的库,如 FSharp.Collections.ParallelSeq)会使你 运行 陷入该问题。
不幸的是,编译器生成的 class 的静态构造函数是您获得的 GetHtmlForAllPages
值。来自 ILSpy(使用 C# 格式):
namespace <StartupCode$ConsoleApplication1>
{
internal static class $Program
{
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
internal static readonly Program.RetryBuilder retry@17;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
internal static readonly int[] GetHtmlForAllPages@24;
[DebuggerBrowsable(DebuggerBrowsableState.Never), DebuggerNonUserCode, CompilerGenerated]
internal static int init@;
static $Program()
{
$Program.retry@17 = new Program.RetryBuilder(4);
IEnumerable<int> pages = Operators.OperatorIntrinsics.RangeInt32(1, 1, 10);
ParallelQuery<int> parallelQuery = PSeqModule.map<int, int>(new Program.allHtml@26(), pages);
ParallelQuery<int> parallelQuery2 = parallelQuery;
int[] allHtml = SeqModule.ToArray<int>((IEnumerable<int>)parallelQuery2);
$Program.GetHtmlForAllPages@24 = allHtml;
}
}
}
在你的实际 Program
class:
[CompilationMapping(SourceConstructFlags.Value)]
public static int[] GetHtmlForAllPages
{
get
{
return $Program.GetHtmlForAllPages@24;
}
}
这就是死锁的来源。
一旦您将 GetHtmlForAllPages
更改为函数(通过添加 ()
),它就不再是该静态构造函数的一部分,这使得程序按预期工作。