[PATCH] D132585: [VPlan] Add field to track if intrinsic should be used for call. (NFC)

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 29 07:16:06 PDT 2022


Ayal added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4167
 
   Intrinsic::ID ID = getVectorIntrinsicIDForCall(&CI, TLI);
 
----------------
This is now also redundant given VectorIntrinsicID?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4176
       Value *Arg;
-      if (!UseVectorIntrinsic ||
-          !isVectorIntrinsicWithScalarOpAtArg(ID, I.index()))
+      if (VectorIntrinsicID == Intrinsic::not_intrinsic ||
+          !isVectorIntrinsicWithScalarOpAtArg(VectorIntrinsicID, I.index()))
----------------
nit: can ask if (!VectorIntrinsicID || ...) given that Intrinsic::not_intrinsic is fixed to zero.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4187
     Function *VectorF;
-    if (UseVectorIntrinsic) {
+    if (VectorIntrinsicID != Intrinsic::not_intrinsic) {
       // Use vector version of the intrinsic.
----------------
nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8321
+  bool CanUseVectorIntrinsic =
+      ID != Intrinsic::not_intrinsic &&
+      LoopVectorizationPlanner::getDecisionAndClampRange(
----------------
nit: can ask if ID given that Intrinsic::not_intrinsic is fixed to zero.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8329
+        InstructionCost CallCost =
+            CM.getVectorCallCost(CI, VF, NeedToScalarize);
+        InstructionCost IntrinsicCost =
----------------
fhahn wrote:
> Ayal wrote:
> > Avoid considering CallCost if NeedToScalarize is true?
> > 
> > Avoid getting decision and clamping Range if !ID, when a vector call can be used, e.g., w/o clamping Range (WillWiden)?
> > 
> > The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC.
> > Avoid considering CallCost if NeedToScalarize is true?
> 
> I am not sure if we need to handle this explicitly, as the cost comparison should either chose the vector intrinsic (if it is cheaper than the lib call which may get scalarized) or `CanUseVectorCall` will be also false.
> 
> > Avoid getting decision and clamping Range if !ID, when a vector call can be used, e.g., w/o clamping Range (WillWiden)?
> 
> Added a check, thanks!
> 
> > The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC.
> 
> I think we need to clamp both separately. Before, we could have VPlans where we either use lib functions or intrinsics for the same call for different VFs. Now we need to split them to track whether an intrinsic or libfunc should be used. I added a test case to show this: 005d1a8ff533
> 
> It should only change the debug output (VPlan printing) but not the generated code, so arguably this can be considered NFC (from the perspective of the generated code) or not.
>> The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC.

> I think we need to clamp both separately. Before, we could have VPlans where we either use lib functions or intrinsics for the same call for different VFs. Now we need to split them to track whether an intrinsic or libfunc should be used. I added a test case to show this: 005d1a8ff533

Hmm, getDecisionAndClampRange() works with boolean decisions rather than 3-way ones. May result in excessive clamping, which is ok albeit potentially conservative. E.g., say first VF=2 of range can make a vector call but next VF=4 cannot, where both can more efficiently make an intrinsic call, range would clamp after VF=2 needlessly.

One way to optimize the clamping is to figure out the compound decision for first VF of range and then getDecisionAndClampRange() accordingly - worth the hassle?


```
      bool ScalarBetterThanVectorAtStart;
      InstructionCost CallCostAtStart =
                CM.getVectorCallCost(CI, Range.Start, ScalarBetterThanVectorAtStart);
      bool IntrinsicBestAtStart = ID && CM.getVectorIntrinsicCost(CI, Range.Start) < CallCostAtStart;

      LoopVectorizationPlanner::getDecisionAndClampRange(
          [&](ElementCount VF) -> bool {
            bool ScalarBetterThanVectorAtVF;
            // Is it beneficial to perform intrinsic call compared to lib call?
            InstructionCost CallCostAtVF =
                CM.getVectorCallCost(CI, VF, ScalarBetterThanVectorAtVF);
            bool IntrinsicBestAtVF = ID && CM.getVectorIntrinsicCost(CI, VF) < CallCostAtVF;
            return (IntrinsicBestAtStart == IntrinsicBestAtVF) &&
                       (IntrinsicBestAtStart || ScalarBetterThanVectorAtVF == ScalarBetterThanVectorAtVF);
          },
          Range);
```

CM.getVectorCallCost() already compares vector call cost with scalar call cost, returning the cheaper along with an indicator which is it.
Perhaps worth extending this API to compare the three alternatives, returning the cheapest along with an indicator(s) which is it(?)

> It should only change the debug output (VPlan printing) but not the generated code, so arguably this can be considered NFC (from the perspective of the generated code) or not.



================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:454
+
+  if (VectorIntrinsicID == Intrinsic::not_intrinsic)
+    O << " (using library function)";
----------------
nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:77
           NewRecipe =
-              new VPWidenCallRecipe(*CI, Plan->mapToVPValues(CI->args()));
+              new VPWidenCallRecipe(*CI, Plan->mapToVPValues(CI->args()), true);
         } else if (SelectInst *SI = dyn_cast<SelectInst>(Inst)) {
----------------
Pass some Intrinsic::ID instead of `true`?


================
Comment at: llvm/unittests/Transforms/Vectorize/VPlanTest.cpp:809
   Args.push_back(&Op2);
-  VPWidenCallRecipe Recipe(*Call, make_range(Args.begin(), Args.end()));
+  VPWidenCallRecipe Recipe(*Call, make_range(Args.begin(), Args.end()), false);
   EXPECT_TRUE(isa<VPUser>(&Recipe));
----------------
`false` is synonymous with Intrinsic::not_intrinsic being zero?


================
Comment at: llvm/unittests/Transforms/Vectorize/VPlanTest.cpp:1069
+    VPWidenCallRecipe Recipe(*Call, make_range(Args.begin(), Args.end()),
+                             false);
     EXPECT_TRUE(Recipe.mayHaveSideEffects());
----------------
ditto


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132585/new/

https://reviews.llvm.org/D132585



More information about the llvm-commits mailing list