[PATCH] D147996: [X86] combineConcatVectorOps - remove FADD/FSUB/FMUL handling (2-1)

Mon Apr 17 06:46:02 PDT 2023

LuoYuanke added a comment.

In D147996#4273314 <https://reviews.llvm.org/D147996#4273314>, @RKSimon wrote:

> LGTM if its causing regressions, but I'd appreciate any time you can spend on PR60441

It seems there is ScalarizerPass that can scalarize the vector, but it is not enabled. We can scalarize the small vector before the vectorization and let vectorizer re-vectorize them. I did a rough experiment with below patch and it seems the code of https://godbolt.org/z/sojxs9EGK can be vectorized with command `clang -g0 -O3 -march=x86-64-v4 -ffast-math -mllvm -scalarize-load-store t.cpp -S`

  diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
  index 4b759693fec2..a74a77872eb7 100644
  --- a/llvm/lib/Passes/PassBuilderPipelines.cpp
  +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
  @@ -113,6 +113,7 @@
   #include "llvm/Transforms/Scalar/Reassociate.h"
   #include "llvm/Transforms/Scalar/SCCP.h"
   #include "llvm/Transforms/Scalar/SROA.h"
  +#include "llvm/Transforms/Scalar/Scalarizer.h"
   #include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
   #include "llvm/Transforms/Scalar/SimplifyCFG.h"
   #include "llvm/Transforms/Scalar/SpeculativeExecution.h"
  @@ -968,6 +969,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
     EarlyFPM.addPass(LowerExpectIntrinsicPass());
     EarlyFPM.addPass(SimplifyCFGPass());
     EarlyFPM.addPass(SROAPass(SROAOptions::ModifyCFG));
  +  EarlyFPM.addPass(ScalarizerPass());
     EarlyFPM.addPass(EarlyCSEPass());
     if (Level == OptimizationLevel::O3)
       EarlyFPM.addPass(CallSiteSplittingPass());

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147996/new/

https://reviews.llvm.org/D147996