NVIDIA 的 maxwell GPU 中的 L1 缓存有什么用？

Question

NVIDIA 推出他们的 maxwell GPU 已有一段时间，但在阅读 "Maxwell Tuning Guide" 时，我对 L1 缓存的功能感到困惑。在开普勒时代，全局内存访问只缓存在L2中，L1用于缓存本地内存访问，这是由寄存器溢出引起的。通过阅读 NVIDIA 的文档，我知道这种本地内存缓存是唯一可以从 L1 缓存中受益的东西。然而，在"Maxwell Tuning Guide"的1.4.2.1部分，NVIDIA表示：

As with Kepler, global loads in first-generation Maxwell are cached in L2 only ... Local loads also are cached in L2 only

CUDA 6.0 添加了两个新的设备属性 globalL1CacheSupported 和 localL1CacheSupported 来检查设备是否支持全局内存 L1 缓存和本地内存 L1 缓存, 所以我在GTX 780和GTX 980卡上都对这两个属性做了测试，结果让我更加迷茫了：

        globalL1CacheSupported    localL1CacheSupported

GTX780            1                         1

GTX980            0                         0

GTX 980的结果验证了"Maxwell Tuning Guide"中的说法，这让我很困惑，因为如果是这样，那么L1缓存是用来做什么的？我无法理解的另一件事是 GTX 780 是 GK110 卡，从 GK110 白皮书中，Keper 也仅在 L2 中缓存其全局内存访问，因此对我来说 globalL1CacheSupported returns 1 对于 GTX 780 卡没有意义.希望有人能解开我的困惑。

Answer 1

On Maxwell, the L1 functionality has been combined with the texture cache. This is referred to in the tuning guide 还有。

Fermi devices 引入了 L1，用于全局和本地负载缓存。 L1是write-through cache，所以对global和local store的影响相对较小

使用 Kepler，L1 对全局负载禁用，但对本地负载仍然有效。

then what is L1 cache used for ?

对于 Maxwell，L1 关于全局负载的默认行为 是相同的 - 它们不会被缓存。但是，您可以"opt-in"将全局负载缓存在 L1 中，如您在 Maxwell tuning guide 中所述：

"In a manner similar to Kepler GK110B, GM204 retains this behavior by default but also allows applications to opt-in to caching of global loads in its unified L1/Texture cache. The opt-in mechanism is the same as with GK110B: pass the -Xptxas -dlcm=ca flag to nvcc at compile time."

GK110B 是出现在 K40 设备中的 GK110 的变体。在 K20/K20x，L1 行为不可修改（对于全局负载关闭）。在 K40 上，L1 的默认行为与 K20/K20x 相同。但是可以覆盖默认行为以针对全局负载打开 L1。

NVIDIA 的 maxwell GPU 中的 L1 缓存有什么用？

What is L1 cache used for in NVIDIA's maxwell GPUs?

caching

cuda