[PATCH] D132458: [LoopVectorize] Support masked function vectorization

Thu Oct 13 02:29:34 PDT 2022

huntergr added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:478
+      // TODO: Do we need TTI checks for masking here? Or can we
+      // assume it works by this point? Maybe add to the recipe...
+      if (!VectorF) {
----------------
fhahn wrote:
> It would be good if the decision whether to used the masked or non-masked variant would be taken at the time of VPlan construction instead of during executing.
> 
>  It would probably also be good to pass in the mask as operand to the recipe, especially if we want to support non-trivial masks in the future.
So the problem I had with trying to decide up front was that you might have both masked and unmasked variants available, and the decision on which one to use left to the cost model -- which I think is calculated after VPlan construction.

For example, on AArch64 you might have a non-masked NEON variant and a masked SVE variant. If you know the implementation width is 128b, then the cost would be slightly higher for generating the mask for the SVE variant. If it's 256b or higher, it might be worth the extra cost due to additional parallelism.

Is there a (straightforward) way to tell VPlan that it may need to construct different recipes based on masked/non-masked variants being available? Or would this need some reworking of VPlan?

I did add the mask as an operand in the case where it is required though. If we can generate multiple recipes easily then it can be added to the operands when a dummy mask is required (and possibly shared if there are multiple calls, giving a more accurate cost).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132458/new/

https://reviews.llvm.org/D132458