[PATCH] D130729: [InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics

Mon Aug 8 17:47:15 PDT 2022

jrbyrnes added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll:158
+; CHECK-NEXT:    v_pk_mov_b32 v[0:1], s[6:7], s[6:7] op_sel:[0,1]
+; CHECK-NEXT:    global_atomic_add_f64 v2, v[0:1], s[0:1]
+; CHECK-NEXT:    s_endpgm
----------------
b-sumner wrote:
> rampitec wrote:
> > arsenm wrote:
> > > rampitec wrote:
> > > > arsenm wrote:
> > > > > jrbyrnes wrote:
> > > > > > arsenm wrote:
> > > > > > > I wouldn't expect this transform to happen. I would expect to emit the flat instruction for the flat atomic despite the address space
> > > > > > Not for this PHI test in particular, but for all these tests in which we lower to a global_atomic, right? 
> > > > > Yes. I expect the flat atomic intrinsic to give the flat instruction regardless of address space
> > > > If we know its AS exactly why not to do it? Especially that we are widely using code specialization with AS checking when flat atomic is unavailable.
> > > I thought the whole reason we had these address space specific intrinsics in the first place was because of the painfully divergent behaviors in the instructions
> > Added @b-sumner. There is some divergence between DS and VMEM, I do not recall global vs flat within the same GPU. But then I believe these intrinsics exist to use what the target can offer, so mostly because of the divergence between GPUs itself.
> Agreed.  These intrinsics are used to expose HW capabilities when available, and users will be pleased if we can specialize to a known address space.
Hi All - thanks for the thoughts and discussion!

Hi Matt -- I took a look, and AtomicLoadFAdd SDNodes with AddressSpace(1) pointer operands have ISel patterns to match to either global_atomic or flat_atomic. However, it appears the prioritization / complexity model in FLATInstructions.td favors global intrinsics over flat atomics when both are feasible, which is why we lower to the global intrinsic here.

It seems the consensus is to specialize into globals where possible (as is done here)?

If so, the concern I have is that is that this behavior does not occur for global-isel (at least not for this test). The node is lowered to flat (despite the address space inference) and we are seeing the illegal offset in generated code. I wonder if address space specialization in global-isel is something that should be addressed in a separate ticket.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130729/new/

https://reviews.llvm.org/D130729