[PATCH] D130442: [RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics.
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 28 13:49:10 PDT 2022
craig.topper added inline comments.
================
Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:2559
+ // Skip if True has merge operand.
+ if (RISCVII::hasMergeOp(TII->get(TrueOpc).TSFlags))
+ continue;
----------------
reames wrote:
> craig.topper wrote:
> > reames wrote:
> > > I believe you have a missing check here. Don't we need the passthrough for True to be False? Otherwise, it seems like the vmerge might be merging two unrelated values?
> > >
> > > We could potentially phrase this as an eager demanded lane style push back of the mask which wouldn't require that check, but that doesn't seem consistent with your code below.
> > This is only handling the case for True being an unmasked, tail agnostic instruction. That's why it calls `lookupMaskedIntrinsicByUnmaskedTA` below.
> Right, but that's insufficient.
>
> Converting:
> vadd v1, v2, v3
> vmul v4, v5, v6
> vmerge v7, v1, v4, v0
>
> To:
> vadd v7 v2, v3, v0
> vmul v4, v5, v6
>
> Doesn't seem like what we want?
>
> There's a variant here which would be legal, but it doesn't seem to be the transform implemented here. That would be:
> vadd v7 v2, v3, v0
> v0 = not v0
> vmul v7, v5, v6, v0
Note the earlier check that the Merge operand and False operand of the VMERGE are the same.
================
Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:2608
+ // Try to transform Result to unmasked intrinsic.
+ doPeepholeMaskedRVV(Result);
+ MadeChange = true;
----------------
reames wrote:
> craig.topper wrote:
> > reames wrote:
> > > This doesn't make any sense. The only way this could succeed is if the mask was all ones; we shouldn't have a merge with an all ones mask to begin with right?
> > A vp.merge intrinsic can be emitted by the BSC vectorizer to represent a tail undisturbed merge. This happens in reduction loops where the last iteration has less elements than the previous iteration. So we need to merge the upper elements from a previous iteration under a tail undisturbed policy. The mask on the vp.merge would be all 1s.
> Wouldn't an insert_sub_vector be a better representation here? Or is your point that we had an insert_sub_vector, and that got lowered to the vmerge? I'm guessing yes, that's your point because thinking that through seems to make sense.
>
> I guess we don't have a dedicated insert_sub_reg analogous instruction. VADD could work - in analogy to scalar move idioms - but I don't have a strong reason to prefer that over vmerge if that's what we're currently using.
>
>
The register size didn’t change. Only the VL. There is no statically sized subvector.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130442/new/
https://reviews.llvm.org/D130442
More information about the llvm-commits
mailing list