[PATCH] D124387: AMDGPU: Fold out readfirstlane between vgpr to vgpr copies

Mon Apr 25 13:32:16 PDT 2022

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:1858
+        //
+        // => %2 = COPY %0
+        //
----------------
b-sumner wrote:
> arsenm wrote:
> > b-sumner wrote:
> > > foad wrote:
> > > > This transformation only makes sense if you know that %0 is uniform. I think @nhaehnle has suggested introducing a "readanylane" pseudo and/or intrinsic for that kind of use case.
> > > > 
> > > > I'm not sure if there is any existing code that deliberately uses readfirstlane on a non-uniform argument, but if there is then this will break it.
> > > We use readfirstlane to "elect" a value from the currently active lines.  The argument is likely not uniform, and breaking such code would be problematic.
> > I thought this was wrong at first but don't see where the problem is. If you're reading the value back into a VGPR with the same exec mask at a later point, where is the difference? At the copy to VGPR, you're copying the from the same lane
> This use of readfirstlane is broadcasting the value in the elected lane (.e. the first lane) to all other active lanes.
In the original code, every lane gets the same value in %2. If you remove the readfirstlane, they might get different values (if %0 is non-uniform).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124387/new/

https://reviews.llvm.org/D124387