[PATCH] D135386: [RISCV] Use mask agnostic policy for isel patterns where the merge operand is IMPLICIT_DEF.

Thu Oct 6 15:08:29 PDT 2022

reames accepted this revision.
reames added a comment.
This revision is now accepted and ready to land.

LGTM

================
Comment at: llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td:1957
                                             VLOpFrag)),
             (!cast<Instruction>("PseudoVSLIDEDOWN_VI_"#vti.LMul.MX)
                 (vti.Vector (IMPLICIT_DEF)), vti.RegClass:$rs1, uimm5:$rs2,
----------------
craig.topper wrote:
> reames wrote:
> > Slide down has special wording around the handling of the mask.  Could you move this to it's own patch?  I think this needs careful consideration on it's own.
> > 
> > (Same for slide up)
> > Slide down has special wording around the handling of the mask.  Could you move this to it's own patch?  I think this needs careful consideration on it's own.
> > 
> > (Same for slide up)
> 
> Can you paste the special wording you are referring to?
> 
> This patch doesn't touch slideup because slideup always reads the destination register for the elements less than OFFSET. So it doesn't have an equivalent of this pattern.
>From 16.3:
The tail agnostic/undisturbed policy is followed for tail elements.
The slide instructions may be masked, with mask element i controlling whether destination element i is written. The mask
undisturbed/agnostic policy is followed for inactive elements.

Reading that again, that looks like the natural interpretation.  On first read, I'd gotten confused on the wording of the last sentence.  

================
Comment at: llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll:14
 ; CHECK-NEXT:    flh ft0, %lo(.LCPI0_0)(a1)
-; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, mu
+; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; CHECK-NEXT:    vfabs.v v9, v8, v0.t
----------------
craig.topper wrote:
> reames wrote:
> > Hm, this example and several others in the diff appear to be a regression in practice.  
> > 
> > It looks like we're missing a couple of cases in InsertVSETVLI.  The general pattern is promoting a MA use to MU if doing so would allow avoiding an extra vsetvli.  I see two forms in this test case.
> > 
> > There's a general profitability question though.  Does switching from MA to MU ever cost enough execution wise to be worth the extra vsetvli?  I can't think of any cases, but maybe?
> Definitely in this case, using MU can't have any performance effect. Using mu creates a false dependency on v9 for the vfcvt.x.f.v. But v9 was already used to create the mask, so last writer of v9 already finished executing.
> 
> Though there's nothing forcing the vfcvt.x.f.v to have the same destination as the vfabs.v. That's just lucky register allocation.
Let's chat about this a bit offline.  I don't think this is a blocking concern, I just want to figure out what's a reasonable heuristic for InsertVSETVLI.  

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135386/new/

https://reviews.llvm.org/D135386