[PATCH] D130442: [RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics.

Thu Jul 28 13:49:10 PDT 2022

craig.topper added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:2559
+    // Skip if True has merge operand.
+    if (RISCVII::hasMergeOp(TII->get(TrueOpc).TSFlags))
+      continue;
----------------
reames wrote:
> craig.topper wrote:
> > reames wrote:
> > > I believe you have a missing check here.  Don't we need the passthrough for True to be False?  Otherwise, it seems like the vmerge might be merging two unrelated values?
> > > 
> > > We could potentially phrase this as an eager demanded lane style push back of the mask which wouldn't require that check, but that doesn't seem consistent with your code below.  
> > This is only handling the case for True being an unmasked, tail agnostic instruction. That's why it calls `lookupMaskedIntrinsicByUnmaskedTA` below.
> Right, but that's insufficient.
> 
> Converting:
> vadd v1, v2, v3
> vmul v4, v5, v6
> vmerge v7, v1, v4, v0
> 
> To:
> vadd v7 v2, v3, v0
> vmul v4, v5, v6
> 
> Doesn't seem like what we want?
> 
> There's a variant here which would be legal, but it doesn't seem to be the transform implemented here.  That would be:
> vadd v7 v2, v3, v0
> v0 = not v0
> vmul v7, v5, v6, v0
Note the earlier check that the Merge operand and False operand of the VMERGE are the same.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:2608
+    // Try to transform Result to unmasked intrinsic.
+    doPeepholeMaskedRVV(Result);
+    MadeChange = true;
----------------
reames wrote:
> craig.topper wrote:
> > reames wrote:
> > > This doesn't make any sense.  The only way this could succeed is if the mask was all ones; we shouldn't have a merge with an all ones mask to begin with right?
> > A vp.merge intrinsic can be emitted by the BSC vectorizer to represent a tail undisturbed merge. This happens in reduction loops where the last iteration has less elements than the previous iteration. So we need to merge the upper elements from a previous iteration under a tail undisturbed policy. The mask on the vp.merge would be all 1s.
> Wouldn't an insert_sub_vector be a better representation here?  Or is your point that we had an insert_sub_vector, and that got lowered to the vmerge?  I'm guessing yes, that's your point because thinking that through seems to make sense.  
> 
> I guess we don't have a dedicated insert_sub_reg analogous instruction.  VADD could work - in analogy to scalar move idioms - but I don't have a strong reason to prefer that over vmerge if that's what we're currently using.  
> 
> 
The register size didn’t change. Only the VL. There is no statically sized subvector.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130442/new/

https://reviews.llvm.org/D130442