可靠队列的快照隔离被破坏了?

Snapshot isolation for reliable queue is broken?

更新: 此错误已在 Service Fabric SDK version 2.4.164 on February 3, 2017. Quote from release notes 中修复:

Fix to ReliableQueue to correctly handle additional transaction levels and combinations

Fixing a bug where ReliableQueue.GetCountAsync did not adhere to Snapshot Isolation if it was also doing Read Your Own Write. This issue was originally reported on . Thank you for your bug reports.


我正在为 Service Fabric 可靠集合编写模拟。我需要这些模拟来尽可能地模仿真实实现的事务行为。

因此,我编写了几个测试用例,我 运行 验证我的模拟行为是否像真实的实现一样。

但是,在一些处理快照隔离的测试用例中,我发现我的模拟具有不同的行为。但是仔细一看,我不太确定错在我这边。

所以我认为我可能偶然发现了可靠队列执行快照隔离的错误。

快照隔离的 MSDN docs 说:

The transaction can recognize only data modifications that were committed before the start of the transaction. Data modifications made by other transactions after the start of the current transaction are not visible to statements executing in the current transaction.

并且:

Reliable Queue support Read Your Writes. In other words, any write within a transaction will be visible to a following read that belongs to the same transaction.

因此,要求快照隔离的操作(如 GetCountAsync)应该看到一个不受其他事务影响的一致快照。只有拥有快照的事务所做的更改才可见。

对于可靠的字典确实如此,但对于可靠的队列则不然。

为可靠队列拍摄的快照(通过 GetCountAsyncCreateEnumerableAsync)确实不受其他事务所做修改的影响,但前提是我们自己不进行任何更改。这样做不仅会使我们自己的更改在快照中可见,而且还会暴露来自其他事务的更改。

可以将以下代码片段放入可靠的服务中以重现此内容:

public async Task Verify_that_reliable_queue_snapshot_isolation_is_broken()
{
    // Get an empty reliable queue
    var name = Guid.NewGuid().ToString();
    var queue = await this.StateManager.GetOrAddAsync<IReliableQueue<string>>(name);

    // Start transaction and take a snapshot by getting queue count
    var t1 = this.StateManager.CreateTransaction();
    Assert.AreEqual(0, await queue.GetCountAsync(t1)); // ok

    // Enqueue something in a concurrent transaction
    using (var t2 = this.StateManager.CreateTransaction())
    {
        await queue.EnqueueAsync(t2, "something");
        await t2.CommitAsync();
    }

    // Snapshot should still say zero
    Assert.AreEqual(0, await queue.GetCountAsync(t1)); // ok

    // Enqueue something else in the first transaction
    await queue.EnqueueAsync(t1, "something else");

    // Count should now be 1 in t1, but it's actually 2.
    Assert.AreEqual(2 /* should be 1*/, await queue.GetCountAsync(t1)); // broken!
}

我需要知道这是否是设计使然,文档不正确,或者这是否是一个错误。或者如果我误解了什么。

欢迎任何反馈。

感谢您报告此问题 Marten。这是 Reliable Queue 中的错误。我们会尽快修复它。若带来不便请谅解。

问题解决后,我会在此线程上更新。