[PATCH] D114171: [SLP]Improve reductions analysis and emission, part 1.

Fri May 20 17:28:01 PDT 2022

ABataev added a comment.

In D114171#3528903 <https://reviews.llvm.org/D114171#3528903>, @alexfh wrote:

> In D114171#3528894 <https://reviews.llvm.org/D114171#3528894>, @ABataev wrote:
>
>> In D114171#3528866 <https://reviews.llvm.org/D114171#3528866>, @alexfh wrote:
>>
>>> In D114171#3528093 <https://reviews.llvm.org/D114171#3528093>, @ABataev wrote:
>>>
>>>> Aha, I committed 4e271fc49517362a9333371fb1ab7e865d4c1b0e <https://reviews.llvm.org/rG4e271fc49517362a9333371fb1ab7e865d4c1b0e> earlier today, which should improve it. Try to update the compiler.
>>>
>>> Thanks! This patch makes SLP vectorizer pass much faster on the problematic input, but it doesn't completely compensate the slowdown introduced here.  This is how -ftime-report looks like after 4e271fc49517362a9333371fb1ab7e865d4c1b0e <https://reviews.llvm.org/rG4e271fc49517362a9333371fb1ab7e865d4c1b0e>:
>>>
>>>   ===-------------------------------------------------------------------------===
>>>                         ... Pass execution timing report ...
>>>   ===-------------------------------------------------------------------------===
>>>     Total Execution Time: 308.9466 seconds (308.9519 wall clock)
>>>   
>>>      ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>>>     132.7263 ( 45.9%)   0.0001 (  0.0%)  132.7264 ( 43.0%)  132.7351 ( 43.0%)  SLPVectorizerPass
>>>     48.0866 ( 16.6%)   5.7199 ( 29.3%)  53.8065 ( 17.4%)  53.8123 ( 17.4%)  ModuleInlinerWrapperPass
>>>     47.3402 ( 16.4%)   5.4604 ( 27.9%)  52.8007 ( 17.1%)  52.8060 ( 17.1%)  DevirtSCCRepeatedPass
>>>     22.2568 (  7.7%)   0.3222 (  1.6%)  22.5790 (  7.3%)  22.5785 (  7.3%)  GVNPass
>>>     11.3194 (  3.9%)   1.0507 (  5.4%)  12.3701 (  4.0%)  12.3520 (  4.0%)  InstCombinePass
>>>      4.8834 (  1.7%)   1.2548 (  6.4%)   6.1382 (  2.0%)   6.1350 (  2.0%)  InlinerPass
>>>
>>> And this is how it looked at 38d0df557706940af5d7110bdf662590449f8a60 <https://reviews.llvm.org/rG38d0df557706940af5d7110bdf662590449f8a60> (the closest commit before 7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd <https://reviews.llvm.org/rG7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd> where I could compile clang):
>>>
>>>   ===-------------------------------------------------------------------------===
>>>                         ... Pass execution timing report ...
>>>   ===-------------------------------------------------------------------------===
>>>     Total Execution Time: 181.4693 seconds (181.4723 wall clock)
>>>   
>>>      ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>>>     48.1675 ( 29.7%)   5.6593 ( 29.3%)  53.8269 ( 29.7%)  53.8332 ( 29.7%)  ModuleInlinerWrapperPass
>>>     47.4270 ( 29.2%)   5.3983 ( 28.0%)  52.8253 ( 29.1%)  52.8315 ( 29.1%)  DevirtSCCRepeatedPass
>>>     22.1909 ( 13.7%)   0.2916 (  1.5%)  22.4824 ( 12.4%)  22.4825 ( 12.4%)  GVNPass
>>>     11.1989 (  6.9%)   1.0292 (  5.3%)  12.2281 (  6.7%)  12.2204 (  6.7%)  InstCombinePass
>>>      4.9943 (  3.1%)   1.1928 (  6.2%)   6.1871 (  3.4%)   6.1842 (  3.4%)  InlinerPass
>>>      5.2197 (  3.2%)   0.0000 (  0.0%)   5.2198 (  2.9%)   5.2201 (  2.9%)  SLPVectorizerPass
>>>
>>> Note the 5s -> 130s jump in time spent in SLPVectorizerPass. I'll grab the profile for the updated binary, but it will take some time. And yes, still trying to reduce the test case.
>>
>> The perf profile should help, thanks
>
> Looking at this I wonder whether SmallPtrSet is not that small? :)
>
>   -   86.47%     0.00%  clang    clang               [.] cc1_main                                                                                                                                                                                     ▒
>      - cc1_main                                                                                                                                                                                                                                       ▒
>         - 86.47% clang::ExecuteCompilerInvocation                                                                                                                                                                                                     ▒
>            - 86.47% clang::CompilerInstance::ExecuteAction                                                                                                                                                                                            ▒
>               - 86.47% clang::FrontendAction::Execute                                                                                                                                                                                                 ▒
>                  - 86.47% clang::ParseAST                                                                                                                                                                                                             ▒
>                     - 80.62% clang::BackendConsumer::HandleTranslationUnit                                                                                                                                                                            ▒
>                        - 80.17% clang::EmitBackendOutput                                                                                                                                                                                              ▒
>                           - 63.55% (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline                                                                                                                                                 ▒
>                              - 63.55% llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run                                                                                                                                       ▒
>                                 - 46.20% llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run                                                                  ▒
>                                    - 46.17% llvm::ModuleToFunctionPassAdaptor::run                                                                                                                                                                    ▒
>                                       - 45.90% llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run                 ▒
>                                          - 45.89% llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run                                                                                                                       ▒
>                                             - 43.23% llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run                                                            ▒
>                                                  llvm::SLPVectorizerPass::run                                                                                                                                                                         ▒
>                                                - llvm::SLPVectorizerPass::runImpl                                                                                                                                                                     ▒
>                                                   - 43.20% llvm::SLPVectorizerPass::vectorizeChainsInBlock                                                                                                                                            ▒
>                                                      - 42.84% llvm::SLPVectorizerPass::vectorizeSimpleInstructions                                                                                                                                    ▒
>                                                         - 41.02% llvm::SLPVectorizerPass::vectorizeRootInstruction                                                                                                                                    ▒
>                                                            - 40.93% (anonymous namespace)::HorizontalReduction::tryToReduce                                                                                                                           ▒
>                                                               - 18.19% llvm::slpvectorizer::BoUpSLP::buildTree                                                                                                                                        ▒
>                                                                    9.89% llvm::SmallPtrSetImplBase::insert_imp_big                                                                                                                                    ▒
>                                                                  - 6.24% llvm::slpvectorizer::BoUpSLP::buildTree_rec                                                                                                                                  ▒
>                                                                     - 3.23% llvm::slpvectorizer::BoUpSLP::buildTree_rec(llvm::ArrayRef<llvm::Value*>, unsigned int, llvm::slpvectorizer::BoUpSLP::EdgeInfo const&)::$_32::operator()                  ▒
>                                                                        + 1.61% llvm::DenseMapBase<llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int> >, llvm::V▒
>                                                                       0.85% getSameOpcode                                                                                                                                                             ▒
>                                                                     - 0.78% llvm::slpvectorizer::BoUpSLP::newTreeEntry                                                                                                                                ▒
>                                                                          0.72% llvm::SmallPtrSetImplBase::insert_imp_big                                                                                                                              ▒
>                                                                    0.98% llvm::SmallPtrSetImplBase::FindBucketFor                                                                                                                                     ▒
>                                                                 8.57% llvm::SmallPtrSetImplBase::FindBucketFor                                                                                                                                        ▒
>                                                               + 4.71% llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseM▒
>                                                                 3.23% (anonymous namespace)::HorizontalReduction::tryToReduce(llvm::slpvectorizer::BoUpSLP&, llvm::TargetTransformInfo*)::{lambda(bool)#1}::operator()                                ▒
>                                                                 0.70% memset                                                                                                                                                                          ▒
>                                                         + 1.82% tryToVectorizeSequence<llvm::Value>                                                                                                                                                   ▒

Ok, thanks, will improve it on Monday. We can avoid multiple creation of SmallPtrSet and I'll check for other possible optimizations too.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114171/new/

https://reviews.llvm.org/D114171