[PATCH] D150288: [SLP] Simplify `tryToVectorizeSequence()`

Thu May 18 16:31:27 PDT 2023

aeubanks added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:14276
-    function_ref<bool(ArrayRef<T *>, bool)> TryToVectorizeHelper,
-    bool MaxVFOnly, BoUpSLP &R) {
   bool Changed = false;
----------------
vporpo wrote:
> ABataev wrote:
> > vporpo wrote:
> > > ABataev wrote:
> > > > ABataev wrote:
> > > > > vporpo wrote:
> > > > > > ABataev wrote:
> > > > > > > vporpo wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > vporpo wrote:
> > > > > > > > > > ABataev wrote:
> > > > > > > > > > > vporpo wrote:
> > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > vporpo wrote:
> > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > vporpo wrote:
> > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > Removal of this parameter causes regressions in the vectorization, better to restore it.
> > > > > > > > > > > > > > > > > Do you have some examples where this is needed?
> > > > > > > > > > > > > > > > No, I don't, just from my experience when I added this parameter. Testing showed regressions in LTO mode with it, IIRC.
> > > > > > > > > > > > > > > Is this in SPEC?
> > > > > > > > > > > > > > IIRC, test/Transforms/SLPVectorizer/slp-max-phi-size.ll is one of the tests
> > > > > > > > > > > > > Yes
> > > > > > > > > > > > OK, I will run SPEC lto to check. 
> > > > > > > > > > > > 
> > > > > > > > > > > > > IIRC, test/Transforms/SLPVectorizer/slp-max-phi-size.ll is one of the tests
> > > > > > > > > > > > Doesn't slp-max-phi-size.ll look OK though? As far as I understand the purpose of this test is to make sure that we don't vectorize wider than specified by the `-slp-max-vf` option.
> > > > > > > > > > > Probably. If I do remember correctly, it caused regresssions because of too early optimization of phi nodes and some other instructions.
> > > > > > > > > > I checked the C/C++ benchmarks of CPU2017 and I don't see any regressions. I also checked with our internal testing and there are no issues. So I think it is pretty safe. 
> > > > > > > > > > In the unlikely case that it causes a regression, we can look into it again and add any missing tests. Wdyt?
> > > > > > > > > Did you try LTO?
> > > > > > > > Ah let me check. I guess you mean `-flto` right?
> > > > > > > Yes. There were regressions because of too early phi nodes vectorization at compile time, which prevents some later optimizations at link time. We could fix this by checking if LTO is enabled and the stage is linking. In this case we can set the flag to true for phis and other nodes. It would be good if you could make a patch for this!
> > > > > > I checked  an LTO build with and I didn't see any performance difference. It's all within the noise margin. 
> > > > > > This is the list of test I tried:
> > > > > > ```
> > > > > > 500.perlbench_r
> > > > > > 502.gcc_r
> > > > > > 505.mcf_r
> > > > > > 508.namd_r
> > > > > > 511.povray_r
> > > > > > 519.lbm_r
> > > > > > 523.xalancbmk_r
> > > > > > 525.x264_r
> > > > > > 531.deepsjeng_r
> > > > > > 538.imagick_r
> > > > > > 544.nab_r
> > > > > > 557.xz_r
> > > > > > ```
> > > > > > However, my build configuration may be different than yours. Could you try out the patch with your configuration and point me to the right benchmark/build options?
> > > > > > 
> > > > > > Changing the functionality depending on whether we are in the linking stage, just to make a SPEC test run faster in some specific configuration does not sound a very good solution to me. 
> > > > > The solution is the same, just at link time we can ignore limits and do our best to vectorize even small sequences, which may cause regression at compile time.
> > > > Also, try to compile everything with -march=native on something like Skylake
> > > Yeah I used `native` for the builds, and I ran it on a `Xeon(R) Gold 6154`.
> > > 
> > > > just at link time we can ignore limits and do our best to vectorize even small sequences
> > > So do you mean like setting `MaxVFOnly` to true only for the linking stage of LTO ?
> > > Yeah I used `native` for the builds, and I ran it on a `Xeon(R) Gold 6154`.
> > > 
> > 
> > Then check with the default target also (SSE2). Check D103638, D108740, D92059 for more context about the changes.
> > 
> > > > just at link time we can ignore limits and do our best to vectorize even small sequences
> > > So do you mean like setting `MaxVFOnly` to true only for the linking stage of LTO ?
> > 
> > Yes, something like this. Because currently we have to limit some vectorization to avoid regressions with LTO.
> > 
> > Yes, something like this. Because currently we have to limit some vectorization to avoid regressions with LTO.
> 
> But how am I going to test how well this will work? I am pretty sure this won't be the only phase-ordering issue in LTO builds if SLP is enabled in pre-link. 
https://reviews.llvm.org/D148010 sounds like the proper solution for this sort of issue

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150288/new/

https://reviews.llvm.org/D150288