[PATCH] D63225: AMDGPU: Fold readlane from copy of SGPR or imm

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jun 16 12:00:17 PDT 2019


rampitec added inline comments.


================
Comment at: lib/Target/AMDGPU/SIFoldOperands.cpp:525
+      // %vgpr = V_MOV_B32 imm
+      // %sgpr = V_READFIRSTLANE_B32 %vgpr
+      // =>
----------------
nhaehnle wrote:
> rampitec wrote:
> > arsenm wrote:
> > > rampitec wrote:
> > > > arsenm wrote:
> > > > > rampitec wrote:
> > > > > > arsenm wrote:
> > > > > > > rampitec wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > rampitec wrote:
> > > > > > > > > > I do not think it is correct. If first lane is off you will fold wrong value. In both cases of readfirstlane and readlane you need to check a specific lane, which I am not sure even possible.
> > > > > > > > > readlane ignores exec. readfirstlane only reads exec to find the first lane, and still returns a result when  exec is 0. The source value is a constant, so it doesn't matter what the lane is
> > > > > > > > Readlane ignores exec, but v_mov_b32 does not. So you may have an old value of vgpr instead of immediate.
> > > > > > > > For readfirstlane you will get wrong value on the sgpr if exec=0 I believe.
> > > > > > > It would matter if there was an exec def between the copy and the use, which I want to ban within a block
> > > > > > But it is not banned yet, right? Are we sure that only happens within a block? Why block cannot be executed with a zero exec? And then at the very least we need to be sure register is virtual.
> > > > > We force execz skip branches around blocks that have readlane/readfirstlane in them. exec = 0 should also be illegal on function entry
> > > > I thought about it. From the ISA perspective it is illegal if you are reading an inactive lane. This can be legal here on the grounds that register is virtual and we are in SSA. Are both conditions true?
> > > SGPR spilling does read/write VGPRs for inactive lanes, but the use of an SSA value for an invalid lane doesn't make sense. I'm not sure if the exec check is strictly necessary, but I've added it. 
> > I do not think this check really changes anything. Look: you have lane 8 disabled, both at def and use. Then you do readlane for the lane 8. It will pass the check and yield wrong value. So this check is not necessary and does not help too.
> > 
> > There are just two conditions which may make it legal: 1) register is virtual 2) we are in SSA. If both a satisfied you can do it, because there can be no other valid def anyway.
> You do need the EXEC mask check either way because the use could be observing a def inside a divergent loop.
> 
> For the readfirstlane case, that check should be sufficient. For the readlane case, I agree that you need to know the register is virtual and in SSA. But with that knowledge, I believe the code should be good as-is.
Right. Exec check probably needed, at any rate it doesn't hurt. But readlane case needs either be removed or additional check needed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63225/new/

https://reviews.llvm.org/D63225





More information about the llvm-commits mailing list