[PATCH] D64862: AMDGPU/GlobalISel: RegBankSelect interp intrinsics

Mon Jan 20 04:35:51 PST 2020

arsenm marked an inline comment as done.
arsenm added a comment.
Herald added a subscriber: kerbowa.

ping

================
Comment at: lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:1332
+      // Waterfall loop for m0 value, which is always the last operand.
+      executeInWaterfallLoop(MI, MRI, { MI.getNumOperands() - 1 });
+      return;
----------------
nhaehnle wrote:
> arsenm wrote:
> > nhaehnle wrote:
> > > arsenm wrote:
> > > > nhaehnle wrote:
> > > > > We don't waterfall these in the non-GlobalISel path as far as I can tell? Should just use readfirstlane if necessary.
> > > > I think this is more of a defect in SelectionDAG because handling this correctly is hard. I’m considering adding a pseudo UniformVGPR register bank for cases where readfirstlane is OK
> > > The fact that a readfirstlane is generated instead of a waterfall loop is not a defect in SelectionDAG, at least not in general.
> > > 
> > > Frontend languages have builtins whose behavior is undefined when certain arguments are dynamically non-uniform. That is, programmers can use them in a context where the compiler cannot possibly prove that the value will be uniform, but the programmer implicitly guarantees that it will be at runtime. Emitting a readfirstlane for those when necessary is the correct behavior.
> > What about the case where optimizations introduce divergence into the value?
> Such transforms usually aren't optimizations, so mostly they shouldn't be done in the first place...
> 
> Okay, so admittedly there may be some cases where going the waterfall route is actually preferable. If this is really the case, then we probably need to do some more thinking about how to properly represent it, perhaps by having the base intrinsics (interp, load/store with resource descriptors) support divergence via waterfall loops, but adding some kind of "assume uniform" intrinsic whose semantics are that it passes through a value similarly to readfirstlane, except that the behavior is actually undefined if the input value is divergent.
> 
> Something like that seems like a reasonable long-term direction, but I'd say that goes beyond the scope of this change :)
These could also assert uniformity by using readfirstlane themselves

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64862/new/

https://reviews.llvm.org/D64862