[PATCH] D132458: [LoopVectorize] Synthesize mask operands for vector variants as needed

Wed Jan 18 07:58:08 PST 2023

huntergr marked 3 inline comments as done.
huntergr added inline comments.

================
Comment at: llvm/include/llvm/Analysis/VectorUtils.h:137
+
+  bool isMasked() const { return getParamIndexForOptionalMask().has_value(); }
+
----------------
david-arm wrote:
> This function is never called - can it be deleted?
There's now a use for it in the assert when building a recipe. (This was used in the original patch, but was left in when splitting into 3 parts).

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3474
+  // function that does use a mask and synthesize an all-true mask.
+  if (!VecFunc && TTI.emitGetActiveLaneMask() != PredicationStyle::None) {
+    Shape = VFShape::get(*CI, VF, /*HasGlobalPred=*/true);
----------------
david-arm wrote:
> This looks a little strange to me. In my mind, the ability to emit an active lane mask based on two integer inputs is orthogonal to how cheap it is to broadcast a true bit across a predicate. For example, an architecture may cheaply support the latter, but not the former. Maybe X86 is such an example? Can we not just let the mask cost decide the behaviour? That way you can simplify this to just
> 
>   if (!VecFunc) {
>      ...
> 
My thinking was to treat the capability to emit an active lane mask as a proxy for being able to use masks at all, but perhaps that's a little too conservative.

I don't know if we should add a proper TTI interface to represent that capability, or just rely on the VFDatabase only having entries which the target is capable of supporting.

In any case, I've removed that check for now.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8380
         // version of the instruction.
+        if (Variant)
+          return false;
----------------
david-arm wrote:
> Doesn't this mean we may end up picking the least optimal VF? For example, if there are v2i32 and v4i32 masked variants we'll only ever pick the v2i32, i.e. the lowest VF?
No. Since we now store the pointer to the Function in the recipe, we need to force vplan to generate different plans for each VF that has a vector variant available.

See the vplan checks for 'test_v2_v4m' in synthesize-mask-for-call.ll -- there are separate VF=2 and VF=4 plans, with a widened call to different functions.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132458/new/

https://reviews.llvm.org/D132458