[PATCH] D135386: [RISCV] Use mask agnostic policy for isel patterns where the merge operand is IMPLICIT_DEF.
Philip Reames via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 6 15:08:29 PDT 2022
reames accepted this revision.
reames added a comment.
This revision is now accepted and ready to land.
LGTM
================
Comment at: llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td:1957
VLOpFrag)),
(!cast<Instruction>("PseudoVSLIDEDOWN_VI_"#vti.LMul.MX)
(vti.Vector (IMPLICIT_DEF)), vti.RegClass:$rs1, uimm5:$rs2,
----------------
craig.topper wrote:
> reames wrote:
> > Slide down has special wording around the handling of the mask. Could you move this to it's own patch? I think this needs careful consideration on it's own.
> >
> > (Same for slide up)
> > Slide down has special wording around the handling of the mask. Could you move this to it's own patch? I think this needs careful consideration on it's own.
> >
> > (Same for slide up)
>
> Can you paste the special wording you are referring to?
>
> This patch doesn't touch slideup because slideup always reads the destination register for the elements less than OFFSET. So it doesn't have an equivalent of this pattern.
>From 16.3:
The tail agnostic/undisturbed policy is followed for tail elements.
The slide instructions may be masked, with mask element i controlling whether destination element i is written. The mask
undisturbed/agnostic policy is followed for inactive elements.
Reading that again, that looks like the natural interpretation. On first read, I'd gotten confused on the wording of the last sentence.
================
Comment at: llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll:14
; CHECK-NEXT: flh ft0, %lo(.LCPI0_0)(a1)
-; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, mu
+; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; CHECK-NEXT: vfabs.v v9, v8, v0.t
----------------
craig.topper wrote:
> reames wrote:
> > Hm, this example and several others in the diff appear to be a regression in practice.
> >
> > It looks like we're missing a couple of cases in InsertVSETVLI. The general pattern is promoting a MA use to MU if doing so would allow avoiding an extra vsetvli. I see two forms in this test case.
> >
> > There's a general profitability question though. Does switching from MA to MU ever cost enough execution wise to be worth the extra vsetvli? I can't think of any cases, but maybe?
> Definitely in this case, using MU can't have any performance effect. Using mu creates a false dependency on v9 for the vfcvt.x.f.v. But v9 was already used to create the mask, so last writer of v9 already finished executing.
>
> Though there's nothing forcing the vfcvt.x.f.v to have the same destination as the vfabs.v. That's just lucky register allocation.
Let's chat about this a bit offline. I don't think this is a blocking concern, I just want to figure out what's a reasonable heuristic for InsertVSETVLI.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135386/new/
https://reviews.llvm.org/D135386
More information about the llvm-commits
mailing list