ARM STLR 内存排序语义

Question

我正在努力解决 ARM STLR 的确切语义问题。

根据文档，它具有发布语义。所以如果你有 STLR 商店，你会得到：

[StoreStore][LoadStore]
X=r1

其中X是内存，r1是一些寄存器。

问题是发布存储和获取负载无法提供顺序一致性：

[StoreStore][LoadStore]
X=r1
r2=Y
[LoadLoad][LoadStore]

在上述情况下，允许重新排序 X=r1 和 r2=Y。为了使这个顺序一致，需要添加一个[StoreLoad]：

[StoreStore][LoadStore]
X=r1
[StoreLoad]
r2=Y
[LoadLoad][LoadStore]

而且您通常在店内这样做，因为进货频率更高。

在 X86 上，普通存储是发布存储，普通加载是获取加载。 [StoreLoad] 可以通过 MFENCE 或使用 LOCK ADDL %(RSP),0 实现，就像在 Hotspot JVM 中所做的那样。

看ARM的文档，好像LDAR已经获得了语义；所以那将是 [LoadLoad][LoadStore].

但 STLR 的语义含糊不清。当我使用 memory_order_seq_cst 编译 C++ atomic 时，只有一个 STLR；没有 DMB。因此，STLR 似乎比发布存储具有更强的内存排序保证。对我来说，似乎在栅栏级别上，STLR 相当于：

 [StoreStore][LoadStore]
 X=r1
 [StoreLoad]

有人能解释一下吗？

Answer 1

我只是在学习这方面的知识，所以请持保留态度。但我的理解是，在 ARMv8/AArch64 中，STLR/LDAR 确实提供了超出 release/acquire 的通常定义的额外语义，但不如你的建议那么强烈。也就是说，发布存储 STLR 确实与按程序顺序跟随它的获取加载 LDAR 具有顺序一致性，但与普通 LDR 加载不一致。

摘自 ARMv8 体系结构参考手册，B2.3.7，“Load-Acquire、Load-AcquirePC 和 Store-Release”：

Where a Load-Acquire appears in program order after a Store-Release, the memory access generated by the Store-Release instruction is Observed-by each PE to the extent that PE is required to observe the access coherently, before the memory access generated by the Load-Acquire instruction is Observed-by that PE, to the extent that the PE is required to observe the access coherently.

从 B2.3.2 开始，“排序关系”：

A read or a write RW1 is Barrier-ordered-before a read or a write RW2 from the same Observer if and only if RW1 appears in program order before RW2 and any of the following cases apply: [...] RW1 is a write W1 generated by an instruction with Release semantics and RW2 is a read R2 generated by an instruction with Acquire semantics.

作为测试，我借用了. With clang 11.0 on godbolt，可以看到即使要求顺序一致性，编译器还是会生成STLR, LDAR拿锁（汇编第18-19行） , 没有 DMB。我运行使用了一段时间（Raspberry Pi 4B，Cortex A72，4 核）并且没有违规。

然而，与您的想法相反，STLR 仍然可以针对其后的普通（非获取）加载进行重新排序，因此它不会隐含地具有完整的 StoreLoad 栅栏。我修改了 LWimsey 的程序，改为使用 STLR, LDR，在添加了一些额外的垃圾来引发竞争后，我能够看到锁定违规。

同样，LDAR 可以针对其之前的普通（非发行）商店重新排序。我同样能够在测试程序中使用 STR, LDAR 获得锁冲突。

ARM STLR 内存排序语义

ARM STLR memory ordering semantics

concurrency

multithreading

arm

atomic

arm64