[PATCH] D156439: [TLI][AArch64] Extend ReplaceWithVeclib to replace vector FREM instructions for scalable vectors

Mon Jul 31 03:55:40 PDT 2023

jolanta.jensen added inline comments.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:103-114
+  if (!TLIFunc) {
+    FunctionType *FTy = nullptr;
+    Type *RetTy = I.getType();
+    if (Masked) {
+      Type *Tys[3] = {RetTy, RetTy,
+                      ToVectorTy(IRBuilder.getInt1Ty(), NumElements)};
+      FTy = FunctionType::get(RetTy, Tys, false);
----------------
mgabka wrote:
> from what I can see this is the only different part of code then in "replaceWithTLIFunction" , hence most of the code below is a code duplication. I think a smart refactoring of replaceWithTLIFunction would be enough to make it work for CI and Instructions.
> 
> 
> The other thing is that the code you wrote duplicates a lot, you can use a suitable container and just add extra element there when you are creating a type for masked function, in that way you do not need to create the type twice and add input types twice.
Lines 129-134 also differ. Line 77 has an assert that is not vaIid here. I created small functions for lines 54-66 and 116-127 that are shared. And I fixed the code duplication above.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:141
+  }
+  Replacement->copyMetadata(I);
+  LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Replaced call to `" << I.getOpcodeName()
----------------
mgabka wrote:
> I guess that this is handled by "replaceAllUsesWith" but worth to check if with a test.
Indeed, it is handled by "replaceAllUsesWith". 

```
void Value::replaceAllUsesWith(Value *New) {
  doRAUW(New, ReplaceMetadataUses::Yes);
}
```
```
void Value::doRAUW(Value *New, ReplaceMetadataUses ReplaceMetaUses) {
...
  if (ReplaceMetaUses == ReplaceMetadataUses::Yes && isUsedByMetadata())
    ValueAsMetadata::handleRAUW(this, New);
...
```
Do we need a test to confirm or is this explanation good enough?

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:235
+  // We only have TLI mappings for SVE.
+  if (!I.getType()->isScalableTy()) {
+    return false;
----------------
mgabka wrote:
> this is not needed, the mechanism here should work for both fixed and scalable types if mappings exist.
> if we want to reject the transformation for scalable vector types I think we should reject it earlier, i.e where we detect frem.
> 
> 
> please LLVM coding guideline on using braces with simple if statememts https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements, it applies to other places in this patch.
There is no mappings for frem with fixed vectors for SLEEF library so we need to check it's a scalable type.

I'll check the code for breaches of the standard and I'll correct.  

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:238
+  }
+  auto *VectorArgTy = dyn_cast<VectorType>(I.getType());
+  if (!VectorArgTy) {
----------------
mgabka wrote:
> when moving this outside this function, this check can be combined with the one above since ScalableVectorType class exists.
I changed to ScalableVectorType.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:244-247
+  StringRef ScalarName =
+      (ElementType->isFloatTy())
+          ? TLI.getName(LibFunc_fmodf)
+          : ((ElementType->isDoubleTy()) ? TLI.getName(LibFunc_fmod) : "");
----------------
mgabka wrote:
> the function name "replaceInstructionWithCallToVeclib" does not suggest that this is specific to fmod/frem, at the moment we do not have intention to extend it beyond frem, so in my opinion it is worth to replace the function name, to avoid confusion. 
Fixed.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:248
+          : ((ElementType->isDoubleTy()) ? TLI.getName(LibFunc_fmod) : "");
+  if (!ScalarName.empty()) {
+    if (!TLI.isFunctionVectorizable(ScalarName)) {
----------------
mgabka wrote:
> please use early exit instead, it is much better idea to use early exits from function instead long nested if block, it improves code readability.
> please apply it where possible to other checks you are doing
Fixed.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:256-257
+        std::string(TLI.getVectorizedFunction(ScalarName, NumElements));
+    const std::string TLINameMasked =
+        std::string(TLI.getVectorizedFunction(ScalarName, NumElements, true));
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Looking up TLI mapping for `"
----------------
mgabka wrote:
> why looking for it in cases where you are not going to use it? it isn't efficient. please change it.
Fixed.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:258-260
+    LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Looking up TLI mapping for `"
+                      << ScalarName << "` and vector width " << NumElements
+                      << ".\n");
----------------
mgabka wrote:
> this message is printed actually after looking for the mappings, so probably should be moved up.
Fixed.

================
Comment at: llvm/lib/CodeGen/ReplaceWithVeclib.cpp:281
     if (auto *CI = dyn_cast<CallInst>(&I)) {
       if (replaceWithCallToVeclib(TLI, *CI)) {
+        ReplacedCalls.push_back(&I);
----------------
mgabka wrote:
> Hi Jolanta,
> in my opinion it would be better to have a single main entry point here, and then branch from inside replaceWithCallToVeclib, thanks to it you can void some of the code duplication, like for example the debug messages.
Fixed.

================
Comment at: llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll:388
+;
+  %out = frem <vscale x 2 x double> %in, shufflevector (<vscale x 2 x double> insertelement (<vscale x 2 x double> poison, double 7.000000e+00, i64 0), <vscale x 2 x double> poison, <vscale x 2 x i32> zeroinitializer)
+  ret <vscale x 2 x double> %out
----------------
mgabka wrote:
> having both arguments of frem as inputs to this function would simplify the test, same apply to other tests.
> 
> 
> The other thing is that other tests in this file are using fast math flags, so please use it for frem as well.
Fixed.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll:2
 ; Do NOT use -O3. It will lower exp2 to ldexp, and the test will fail.
-; RUN: opt -vector-library=sleefgnuabi -replace-with-veclib < %s | opt -vector-library=sleefgnuabi -passes=inject-tli-mappings,loop-unroll,loop-vectorize -S | FileCheck %s --check-prefixes=CHECK,NEON
-; RUN: opt -mattr=+sve -vector-library=sleefgnuabi -replace-with-veclib < %s | opt -vector-library=sleefgnuabi -passes=inject-tli-mappings,loop-unroll,loop-vectorize -S | FileCheck %s --check-prefixes=CHECK,SVE
+; RUN: opt -vector-library=sleefgnuabi -passes=inject-tli-mappings,loop-unroll,loop-vectorize -S < %s | FileCheck %s --check-prefixes=CHECK,NEON
+; RUN: opt -mattr=+sve -vector-library=sleefgnuabi -passes=inject-tli-mappings,loop-unroll,loop-vectorize -S < %s | FileCheck %s --check-prefixes=CHECK,SVE
----------------
mgabka wrote:
> could you explain why did you decide to modify this test?
As I understand [[ https://github.com/llvm/llvm-project/commit/6577cef9b03fb4d151bd997a720ceaf78447f6a2 | the commit message  ]] and the implementation itself, ReplaceWithVeclib pass is to be run on intrinsics operating on vectors not scalar values. This pass did make any changes to the IR and is only confusing.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156439/new/

https://reviews.llvm.org/D156439