[PATCH] D132585: [VPlan] Add field to track if intrinsic should be used for call. (NFC)

Wed Aug 31 09:57:58 PDT 2022

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8329
+        InstructionCost CallCost =
+            CM.getVectorCallCost(CI, VF, NeedToScalarize);
+        InstructionCost IntrinsicCost =
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > Avoid considering CallCost if NeedToScalarize is true?
> > > 
> > > Avoid getting decision and clamping Range if !ID, when a vector call can be used, e.g., w/o clamping Range (WillWiden)?
> > > 
> > > The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC.
> > > Avoid considering CallCost if NeedToScalarize is true?
> > 
> > I am not sure if we need to handle this explicitly, as the cost comparison should either chose the vector intrinsic (if it is cheaper than the lib call which may get scalarized) or `CanUseVectorCall` will be also false.
> > 
> > > Avoid getting decision and clamping Range if !ID, when a vector call can be used, e.g., w/o clamping Range (WillWiden)?
> > 
> > Added a check, thanks!
> > 
> > > The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC.
> > 
> > I think we need to clamp both separately. Before, we could have VPlans where we either use lib functions or intrinsics for the same call for different VFs. Now we need to split them to track whether an intrinsic or libfunc should be used. I added a test case to show this: 005d1a8ff533
> > 
> > It should only change the debug output (VPlan printing) but not the generated code, so arguably this can be considered NFC (from the perspective of the generated code) or not.
> >> The compound decision for which (range of) VF's to use an intrinsic vs. call vs. neither should probably be retained instead of decomposing it into two independent clamps? Calls for better test coverage to make sure patch is indeed NFC.
> 
> > I think we need to clamp both separately. Before, we could have VPlans where we either use lib functions or intrinsics for the same call for different VFs. Now we need to split them to track whether an intrinsic or libfunc should be used. I added a test case to show this: 005d1a8ff533
> 
> Hmm, getDecisionAndClampRange() works with boolean decisions rather than 3-way ones. May result in excessive clamping, which is ok albeit potentially conservative. E.g., say first VF=2 of range can make a vector call but next VF=4 cannot, where both can more efficiently make an intrinsic call, range would clamp after VF=2 needlessly.
> 
> One way to optimize the clamping is to figure out the compound decision for first VF of range and then getDecisionAndClampRange() accordingly - worth the hassle?
> 
> 
> ```
>       bool ScalarBetterThanVectorAtStart;
>       InstructionCost CallCostAtStart =
>                 CM.getVectorCallCost(CI, Range.Start, ScalarBetterThanVectorAtStart);
>       bool IntrinsicBestAtStart = ID && CM.getVectorIntrinsicCost(CI, Range.Start) < CallCostAtStart;
> 
>       LoopVectorizationPlanner::getDecisionAndClampRange(
>           [&](ElementCount VF) -> bool {
>             bool ScalarBetterThanVectorAtVF;
>             // Is it beneficial to perform intrinsic call compared to lib call?
>             InstructionCost CallCostAtVF =
>                 CM.getVectorCallCost(CI, VF, ScalarBetterThanVectorAtVF);
>             bool IntrinsicBestAtVF = ID && CM.getVectorIntrinsicCost(CI, VF) < CallCostAtVF;
>             return (IntrinsicBestAtStart == IntrinsicBestAtVF) &&
>                        (IntrinsicBestAtStart || ScalarBetterThanVectorAtVF == ScalarBetterThanVectorAtVF);
>           },
>           Range);
> ```
> 
> CM.getVectorCallCost() already compares vector call cost with scalar call cost, returning the cheaper along with an indicator which is it.
> Perhaps worth extending this API to compare the three alternatives, returning the cheapest along with an indicator(s) which is it(?)
> 
> > It should only change the debug output (VPlan printing) but not the generated code, so arguably this can be considered NFC (from the perspective of the generated code) or not.
> 
Hm I tried to restructure to code to make things a bit clearer.

If we can use an intrinsic call, clamp the decision to the range of intrinsic calls and return the recipe. If the intrinsic call is profitable at the start, we clamp the range until it becomes unprofitable. If it is not profitable at the beginning, we should clamp the range until it becomes profitable.

If it is not profitable to use an intrinsic call at the start, it must be profitable to use a lib call. Now clamp to the range until lib calls are not profitable.

I *think* that should avoid excessive clamping in most cases in practice and the code seems easier to follow. WDYT?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4167

   Intrinsic::ID ID = getVectorIntrinsicIDForCall(&CI, TLI);

----------------
Ayal wrote:
> This is now also redundant given VectorIntrinsicID?
Yes, should be removed!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4176
       Value *Arg;
-      if (!UseVectorIntrinsic ||
-          !isVectorIntrinsicWithScalarOpAtArg(ID, I.index()))
+      if (VectorIntrinsicID == Intrinsic::not_intrinsic ||
+          !isVectorIntrinsicWithScalarOpAtArg(VectorIntrinsicID, I.index()))
----------------
Ayal wrote:
> nit: can ask if (!VectorIntrinsicID || ...) given that Intrinsic::not_intrinsic is fixed to zero.
Though explicitly checking `== Intrinsic::not_intrinsic` may be clearer, but it seems too verbose. I simplified it.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4187
     Function *VectorF;
-    if (UseVectorIntrinsic) {
+    if (VectorIntrinsicID != Intrinsic::not_intrinsic) {
       // Use vector version of the intrinsic.
----------------
Ayal wrote:
> nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero.
Simplified, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8321
+  bool CanUseVectorIntrinsic =
+      ID != Intrinsic::not_intrinsic &&
+      LoopVectorizationPlanner::getDecisionAndClampRange(
----------------
Ayal wrote:
> nit: can ask if ID given that Intrinsic::not_intrinsic is fixed to zero.
Simplified, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:454
+
+  if (VectorIntrinsicID == Intrinsic::not_intrinsic)
+    O << " (using library function)";
----------------
Ayal wrote:
> nit: can ask if (VectorIntrinsicID) given that Intrinsic::not_intrinsic is fixed to zero.
Simplified thanks!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132585/new/

https://reviews.llvm.org/D132585