[PATCH] D124387: AMDGPU: Fold out readfirstlane between vgpr to vgpr copies

Mon Apr 25 13:36:52 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:1858
+        //
+        // => %2 = COPY %0
+        //
----------------
arsenm wrote:
> foad wrote:
> > b-sumner wrote:
> > > arsenm wrote:
> > > > b-sumner wrote:
> > > > > foad wrote:
> > > > > > This transformation only makes sense if you know that %0 is uniform. I think @nhaehnle has suggested introducing a "readanylane" pseudo and/or intrinsic for that kind of use case.
> > > > > > 
> > > > > > I'm not sure if there is any existing code that deliberately uses readfirstlane on a non-uniform argument, but if there is then this will break it.
> > > > > We use readfirstlane to "elect" a value from the currently active lines.  The argument is likely not uniform, and breaking such code would be problematic.
> > > > I thought this was wrong at first but don't see where the problem is. If you're reading the value back into a VGPR with the same exec mask at a later point, where is the difference? At the copy to VGPR, you're copying the from the same lane
> > > This use of readfirstlane is broadcasting the value in the elected lane (.e. the first lane) to all other active lanes.
> > In the original code, every lane gets the same value in %2. If you remove the readfirstlane, they might get different values (if %0 is non-uniform).
> It's broadcast to a scalar value, but as soon as you have a vector use, it's reduced down to the active lanes again. It's only uniform for scalar uses
Oh, I see here. I'm checking the wrong exec def point. I need to check exec at the point the readfirstlane source is defined, not the readfirstlane itself

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124387/new/

https://reviews.llvm.org/D124387