[PATCH] D117909: [AMDGPU] Remove cndmask from readsExecAsData

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 24 10:57:35 PST 2022


rampitec added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:133
 
 static bool readsExecAsData(const MachineInstr &MI) {
   if (MI.isCompare())
----------------
ruiling wrote:
> foad wrote:
> > ruiling wrote:
> > > The function name is confusing, all these instructions access EXEC just as normal vector instructions. One cause of the failure @sebastian-ne showed in D116053 is we are dropping convergent attribute when lowering amdgcn.icmp into V_CMP. I am not sure if there is any other reason.  I think may be a right fix for the issue is define convergent version of V_CMP, I am not sure whether it sounds reasonable? Please also add some comments this is just workaround for the issue that we are lowering convergent icmp into normal V_CMP. We should fix it some other way.
> > > I think may be a right fix for the issue is define convergent version of V_CMP, I am not sure whether it sounds reasonable?
> > 
> > Yes, sounds good to me. There is also a related problem in that the "convergent" attribute in IR prevents an instruction from being sunk into an "if" but it does not prevent it from being hoisted out of an "if". For ballot we need both. I'm not sure if the MachineIR convergent flag does both.
> 
> > Yes, sounds good to me. There is also a related problem in that the "convergent" attribute in IR prevents an instruction from being sunk into an "if" but it does not prevent it from being hoisted out of an "if". For ballot we need both. I'm not sure if the MachineIR convergent flag does both.
> 
> MachineIR have prohibited both sinking and hoisting.
The function name and the whole idea reflects that unlike other VALU where exec just predicates the execution these instructions use exec much like a normal operand which is reflected in the actual numerical answer. I.e., for cmp that is:

dst = cmp(src0, src1) & exec

For readfirstlane that is:

dst = vgpr[cttz(exec)]

Marking cmp as convergent might be a better fix because it will still allow to move compares within a block, although it sounds a little orthogonal to the exec use modeling to me.


================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:140
     break;
   case AMDGPU::V_READFIRSTLANE_B32:
-  case AMDGPU::V_CNDMASK_B64_PSEUDO:
----------------
ruiling wrote:
> Did you see a failure caused by v_readfirstlane? if not, please also remove it.
No, because it is convergent. But technically is has the same effect of using exec.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117909/new/

https://reviews.llvm.org/D117909



More information about the llvm-commits mailing list