[PATCH] D39350: AMDGPU: Add CPUCoherentL2 feature

Jan Vesely via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 27 11:51:32 PDT 2017


jvesely added a comment.

In https://reviews.llvm.org/D39350#909481, @t-tye wrote:

> In https://reviews.llvm.org/D39350#909426, @jvesely wrote:
>
> > In https://reviews.llvm.org/D39350#908854, @t-tye wrote:
> >
> > > Are we sure using SLC is the way to achieve this? IIRC SLC can be used for streaming, but does not ensure L2 bypass. On an APU the MTYPE=CC specifies the memory policy that support coherence.
> >
> >
> > The CI and GCN3 ISA specs for SLC say: "System Level Coherent. When set, accesses are forced to miss in level 2 texture cache and are coherent with system memory."
> >  Has this been changed?
> >  I only found MTYPE references to for image and buffer rsrc, is there a way to set it for flat ops?
> >  The ISA specs also don't mention what values are allowed in those 3 bits.
>
>
> I do not think using SLC will achieve what you are looking for. The default memory policies for SLC are to enable STREAMING mode which leaves cache lines in the L2 cache and so will not achieve coherence. What you need is for the L2 cache to be kept coherent with the memory fabric which is what the MTYPE and IOMMUv2 can provide on APUs. The runtime can configure the hardware to do this. Buffer instructions use V# that can specify the MTYPE, and there are configuration registers that can be set for each aperture with the MTYPE to use.
>
> What runtime are you intending to use to load and execute the code produced?
>
> How where you thinking of controlling when to enable this as you would not want to affect the existing code generated as adding SLC will make code execute less performantly?


For my specific use case (system calls) I use HCC, but I'd expect this to generally apply to system scope atomics.
The bug has been reported here [0], and I wrote simple atomic tests [1].

Configuring "System Level Coherent" flag to not guarantee system level coherence is a bit counter intuitive decision.
I'm not sure I follow the performance or MTYPE argument. Only system scope atomic operations need this, I'd expect them to be slow, and they should be used sparingly.
Can you point me to where MTYPE values are set (and documented)? I only found rsrc descriptor setup for scratch memory in ROCR.

[0] https://github.com/RadeonOpenCompute/hcc/issues/410
[1] https://github.com/jvesely/hcc-atomic-test


Repository:
  rL LLVM

https://reviews.llvm.org/D39350





More information about the llvm-commits mailing list