[PATCH] D69661: [AMDGPU] Fix vccz after v_readlane/v_readfirstlane to vcc_lo/hi

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 1 15:39:47 PDT 2019


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1409
+      // Writes to vcc will fix it. Only examine explicit defs.
+      for (auto &Op : Inst.defs()) {
+        switch (Op.getReg()) {
----------------
foad wrote:
> arsenm wrote:
> > foad wrote:
> > > arsenm wrote:
> > > > foad wrote:
> > > > > arsenm wrote:
> > > > > > This won't catch the implicit def in inline asm for example
> > > > > I specifically wanted to avoid treating this instruction (from the test case) as a write to vcc, despite its implicit-def:
> > > > > ```
> > > > > $vcc_hi = V_READFIRSTLANE_B32 killed $vgpr0, implicit $exec, implicit-def $vcc
> > > > > ```
> > > > > What kind of inline asm are you thinking of?
> > > > Any inline asm that touches vcc will appear as an implicit-def. You can't know without context that an implicit-def isn't really modifying the register. I'm guessing this one is to model the super-register def?
> > > I don't know exactly why the readfirstlane has an implicit def of vcc. I still think the most conservative fix is to only look for explicit defs of vcc_lo/vcc_hi, which is what my patch does.
> > Looking only at explicit defs is less conservative, as it will miss real defs. It will miss both inlineasm/call like operations, as well as the standard VCC defs as the VOPC vcc def appears in the implicit def list
> It is crucial that we treat this instruction:
> ```
> $vcc_hi = V_READFIRSTLANE_B32 killed $vgpr0, implicit $exec, implicit-def $vcc
> ```
> as a def of vcc_hi (which corrupts vccz) and NOT as a def of vcc (which fixes vccz), despite the implicit-def, otherwise we will not be fixing the bug.
> 
> I suppose I could handle this with something like:
> ```
> if (Inst.definesRegister(VCC_LO) || Inst.definesRegister(VCC_HI))
>   VCCZCorrect = false;
> else if (Inst.definesRegister(VCC))
>   VCCZCorrect = true;
> ```
> where the "else" ensures that V_READFIRSTLANE_B32 will fall under the first condition not the second. Would you prefer that?
> 
> (Or, stepping back a bit, is it wrong for V_READFIRSTLANE_B32 to have an implicit-def of vcc in the first place?)
That may or may not work; I would have to re-check the details of how definesRegister/modifiesRegister are implemented. You should add a few tests stressing the implicit defs to be sure

The implicit def of the super register is probably correct. What I assume happened here is that the implicit def of the super register is necessary to keep the verifier happy. If you look at  SIInstrInfo::copyPhysReg for example, it adds a def of the super register on the first copy so that the later subregister defs don't think they are reading undef other lanes:

    if (Idx == 0)
      Builder.addReg(DestReg, RegState::Define | RegState::Implicit);


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69661/new/

https://reviews.llvm.org/D69661





More information about the llvm-commits mailing list