[PATCH] D156439: [TLI][AArch64] Extend ReplaceWithVeclib to replace vector FREM instructions for scalable vectors

Thu Aug 10 01:14:51 PDT 2023

mgabka added inline comments.

================
Comment at: llvm/include/llvm/Analysis/VectorUtils.h:184
+tryDemangleForVFABI(StringRef MangledName, const Module &M,
+                    std::optional<ElementCount> EC = std::nullopt);

----------------
please add documentation for the new added parameter.

in my opinion the interface will be better if we use it for both scalable and fixed vectors, otherwise it is confusing what this parameter is and when it is used.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:53
+      CI ? ElementCount::getFixed(0)
+         : (dyn_cast<ScalableVectorType>(I.getType()))->getElementCount();
   if (!TLIFunc) {
----------------
this may fail, not all instructions used here will be returning Vector.
You need the EC only for frem case to create Type for the mask, but in this case you have guaranteed that it returns a vector type. Please change it.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:57
                       << "` to module.\n");
-
     ++NumTLIFuncDeclAdded;
----------------
nod needed change

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:76
+          return false;
+        unsigned MaskPos = Info->getParamIndexForOptionalMask().value();
+        LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Mask position `" << MaskPos
----------------
The whole idea about std::optional is that the value may not exist, https://en.cppreference.com/w/cpp/utility/optional/value
and you need to handle it otherwise you will get exception.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:77
+        unsigned MaskPos = Info->getParamIndexForOptionalMask().value();
+        LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Mask position `" << MaskPos
+                          << "`.\n");
----------------
IMO this dbg message is not needed as it is too detailed. LLVM tends to have more high level dbg output otherwise the dbg log are growing too quickly.

for this pass having a dbg messages to show that the is replacement and what has been replaced with what in my opinion is enough.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:92-111
+  CallInst *Replacement = nullptr;
+  if (CI) {
+    // Intrinsics handling.
+    SmallVector<Value *> Args(CI->args());
+    // Preserve the operand bundles.
+    SmallVector<OperandBundleDef, 1> OpBundles;
+    CI->getOperandBundlesAsDefs(OpBundles);
----------------
jolanta.jensen wrote:
> mgabka wrote:
> > mgabka wrote:
> > > this can be written in much more compacted way for example:
> > > 
> > > ```
> > > SmallVector<Value *> Args(I.operand_values());
> > > SmallVector<OperandBundleDef, 1> OpBundles;
> > > // Preserve the operand bundles if it is an intrinsic call.
> > > if (CI)
> > >   CI->getOperandBundlesAsDefs(OpBundles);
> > > // for masked calls to frem add a mask operand =
> > > else if (Masked)
> > >        Args.push_back(IRBuilder.getAllOnesMask(NumElements));
> > > 
> > > CallInst *Replacement = IRBuilder.CreateCall(TLIFunc, Args, OpBundles);
> > > if (isa<FPMathOperator>(Replacement)) {
> > >     // Preserve fast math flags for FP math.
> > >     Replacement->copyFastMathFlags(&I);
> > >   }
> > > ```
> > I was wrong here, when the Instruction is a CI has the call as the first operand so it needs to be a bit different, I also think that it would be good to handle always mask (assuming it is always the last operand)
> > 
> > ```
> > SmallVector<Value *> Args;
> > SmallVector<OperandBundleDef, 1> OpBundles;
> > // Preserve the operand bundles and copy arguments if it is an intrinsic call.
> > if (CI) {
> >   Args.assign(CI->arg_begin(), CI->arg_end());
> >   CI->getOperandBundlesAsDefs(OpBundles);
> >  }
> > else
> >   Args.assign(I.op_begin(), I.op_end());
> > // if mask is requested we need to add it
> > if (Masked)
> >   Args.push_back(IRBuilder.getAllOnesMask(NumElements));
> > 
> > CallInst *Replacement = IRBuilder.CreateCall(TLIFunc, Args, OpBundles);
> > ```
> > 
> This is part of the original implementation and I would prefer not to touch it. And CallInst part does not handle scalable vectors so no mask. And I don't think we even can assume mask is always the last operand even if so is the case for FRem.
The final goal is to make this pass work for scalable vectors, so introducing unnecessary if/else blocks to just remove them later in my opinion is not a good idea.
Since we now want to use getParamIndexForOptionalMask we will now the place of mask.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:73
+      TLIFunc->copyAttributesFrom(OldFunc);
+      assert(OldFuncTy == TLIFunc->getFunctionType() &&
+             "Expecting function types to be identical");
----------------
jolanta.jensen wrote:
> mgabka wrote:
> > IMO  this assert is redundant,  it is essentially checking if the Function::Create works correctly, what is not needed in my opinion.
> This assert is part of the original implementation, removing it would be introducing unnecessary nfc (even if it has been moved to avoid creation of an extra variable -- so the move is a nfc anyway). I would like to keep it since it is a part of the original implementation.
FWIW this you already refactored the code quite a bit so removing not needed bit is not a problem as part of this work.

================
Comment at: llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll:195

+; NOTE: TLI mappings for FREM instruction.
+
----------------
a comment explaining why the transformation is not happening would be useful here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156439/new/

https://reviews.llvm.org/D156439