[PATCH] D114171: [SLP]Improve reductions analysis and emission, part 1.

Alexander Kornienko via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 20 17:33:25 PDT 2022


alexfh added a comment.

In D114171#3528930 <https://reviews.llvm.org/D114171#3528930>, @ABataev wrote:

> In D114171#3528903 <https://reviews.llvm.org/D114171#3528903>, @alexfh wrote:
>
>> In D114171#3528894 <https://reviews.llvm.org/D114171#3528894>, @ABataev wrote:
>>
>>> In D114171#3528866 <https://reviews.llvm.org/D114171#3528866>, @alexfh wrote:
>>>
>>>> In D114171#3528093 <https://reviews.llvm.org/D114171#3528093>, @ABataev wrote:
>>>>
>>>>> Aha, I committed 4e271fc49517362a9333371fb1ab7e865d4c1b0e <https://reviews.llvm.org/rG4e271fc49517362a9333371fb1ab7e865d4c1b0e> earlier today, which should improve it. Try to update the compiler.
>>>>
>>>> Thanks! This patch makes SLP vectorizer pass much faster on the problematic input, but it doesn't completely compensate the slowdown introduced here.  This is how -ftime-report looks like after 4e271fc49517362a9333371fb1ab7e865d4c1b0e <https://reviews.llvm.org/rG4e271fc49517362a9333371fb1ab7e865d4c1b0e>:
>>>>
>>>>   ===-------------------------------------------------------------------------===
>>>>                         ... Pass execution timing report ...
>>>>   ===-------------------------------------------------------------------------===
>>>>     Total Execution Time: 308.9466 seconds (308.9519 wall clock)
>>>>   
>>>>      ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>>>>     132.7263 ( 45.9%)   0.0001 (  0.0%)  132.7264 ( 43.0%)  132.7351 ( 43.0%)  SLPVectorizerPass
>>>>     48.0866 ( 16.6%)   5.7199 ( 29.3%)  53.8065 ( 17.4%)  53.8123 ( 17.4%)  ModuleInlinerWrapperPass
>>>>     47.3402 ( 16.4%)   5.4604 ( 27.9%)  52.8007 ( 17.1%)  52.8060 ( 17.1%)  DevirtSCCRepeatedPass
>>>>     22.2568 (  7.7%)   0.3222 (  1.6%)  22.5790 (  7.3%)  22.5785 (  7.3%)  GVNPass
>>>>     11.3194 (  3.9%)   1.0507 (  5.4%)  12.3701 (  4.0%)  12.3520 (  4.0%)  InstCombinePass
>>>>      4.8834 (  1.7%)   1.2548 (  6.4%)   6.1382 (  2.0%)   6.1350 (  2.0%)  InlinerPass
>>>>
>>>> And this is how it looked at 38d0df557706940af5d7110bdf662590449f8a60 <https://reviews.llvm.org/rG38d0df557706940af5d7110bdf662590449f8a60> (the closest commit before 7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd <https://reviews.llvm.org/rG7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd> where I could compile clang):
>>>>
>>>>   ===-------------------------------------------------------------------------===
>>>>                         ... Pass execution timing report ...
>>>>   ===-------------------------------------------------------------------------===
>>>>     Total Execution Time: 181.4693 seconds (181.4723 wall clock)
>>>>   
>>>>      ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>>>>     48.1675 ( 29.7%)   5.6593 ( 29.3%)  53.8269 ( 29.7%)  53.8332 ( 29.7%)  ModuleInlinerWrapperPass
>>>>     47.4270 ( 29.2%)   5.3983 ( 28.0%)  52.8253 ( 29.1%)  52.8315 ( 29.1%)  DevirtSCCRepeatedPass
>>>>     22.1909 ( 13.7%)   0.2916 (  1.5%)  22.4824 ( 12.4%)  22.4825 ( 12.4%)  GVNPass
>>>>     11.1989 (  6.9%)   1.0292 (  5.3%)  12.2281 (  6.7%)  12.2204 (  6.7%)  InstCombinePass
>>>>      4.9943 (  3.1%)   1.1928 (  6.2%)   6.1871 (  3.4%)   6.1842 (  3.4%)  InlinerPass
>>>>      5.2197 (  3.2%)   0.0000 (  0.0%)   5.2198 (  2.9%)   5.2201 (  2.9%)  SLPVectorizerPass
>>>>
>>>> Note the 5s -> 130s jump in time spent in SLPVectorizerPass. I'll grab the profile for the updated binary, but it will take some time. And yes, still trying to reduce the test case.
>>>
>>> The perf profile should help, thanks
>>
>> Looking at this I wonder whether SmallPtrSet is not that small? :)
>>
>>   -   86.47%     0.00%  clang    clang               [.] cc1_main                                                                                                                                                                                     ▒
>>      - cc1_main                                                                                                                                                                                                                                       ▒
>>         - 86.47% clang::ExecuteCompilerInvocation                                                                                                                                                                                                     ▒
>>            - 86.47% clang::CompilerInstance::ExecuteAction                                                                                                                                                                                            ▒
>>               - 86.47% clang::FrontendAction::Execute                                                                                                                                                                                                 ▒
>>                  - 86.47% clang::ParseAST                                                                                                                                                                                                             ▒
>>                     - 80.62% clang::BackendConsumer::HandleTranslationUnit                                                                                                                                                                            ▒
>>                        - 80.17% clang::EmitBackendOutput                                                                                                                                                                                              ▒
>>                           - 63.55% (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline                                                                                                                                                 ▒
>>                              - 63.55% llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run                                                                                                                                       ▒
>>                                 - 46.20% llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run                                                                  ▒
>>                                    - 46.17% llvm::ModuleToFunctionPassAdaptor::run                                                                                                                                                                    ▒
>>                                       - 45.90% llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run                 ▒
>>                                          - 45.89% llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run                                                                                                                       ▒
>>                                             - 43.23% llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run                                                            ▒
>>                                                  llvm::SLPVectorizerPass::run                                                                                                                                                                         ▒
>>                                                - llvm::SLPVectorizerPass::runImpl                                                                                                                                                                     ▒
>>                                                   - 43.20% llvm::SLPVectorizerPass::vectorizeChainsInBlock                                                                                                                                            ▒
>>                                                      - 42.84% llvm::SLPVectorizerPass::vectorizeSimpleInstructions                                                                                                                                    ▒
>>                                                         - 41.02% llvm::SLPVectorizerPass::vectorizeRootInstruction                                                                                                                                    ▒
>>                                                            - 40.93% (anonymous namespace)::HorizontalReduction::tryToReduce                                                                                                                           ▒
>>                                                               - 18.19% llvm::slpvectorizer::BoUpSLP::buildTree                                                                                                                                        ▒
>>                                                                    9.89% llvm::SmallPtrSetImplBase::insert_imp_big                                                                                                                                    ▒
>>                                                                  - 6.24% llvm::slpvectorizer::BoUpSLP::buildTree_rec                                                                                                                                  ▒
>>                                                                     - 3.23% llvm::slpvectorizer::BoUpSLP::buildTree_rec(llvm::ArrayRef<llvm::Value*>, unsigned int, llvm::slpvectorizer::BoUpSLP::EdgeInfo const&)::$_32::operator()                  ▒
>>                                                                        + 1.61% llvm::DenseMapBase<llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int> >, llvm::V▒
>>                                                                       0.85% getSameOpcode                                                                                                                                                             ▒
>>                                                                     - 0.78% llvm::slpvectorizer::BoUpSLP::newTreeEntry                                                                                                                                ▒
>>                                                                          0.72% llvm::SmallPtrSetImplBase::insert_imp_big                                                                                                                              ▒
>>                                                                    0.98% llvm::SmallPtrSetImplBase::FindBucketFor                                                                                                                                     ▒
>>                                                                 8.57% llvm::SmallPtrSetImplBase::FindBucketFor                                                                                                                                        ▒
>>                                                               + 4.71% llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseM▒
>>                                                                 3.23% (anonymous namespace)::HorizontalReduction::tryToReduce(llvm::slpvectorizer::BoUpSLP&, llvm::TargetTransformInfo*)::{lambda(bool)#1}::operator()                                ▒
>>                                                                 0.70% memset                                                                                                                                                                          ▒
>>                                                         + 1.82% tryToVectorizeSequence<llvm::Value>                                                                                                                                                   ▒
>
> Ok, thanks, will improve it on Monday. We can avoid multiple creation of SmallPtrSet and I'll check for other possible optimizations too.

Significant time seems to be spent in this loop as well:

  for (unsigned Cnt = 0; Cnt < NumReducedVals; ++Cnt) {
    if (Cnt >= Pos && Cnt < Pos + ReduxWidth)
      continue;
    unsigned NumOps = VectorizedVals.lookup(Candidates[Cnt]) +
                      std::count(VL.begin(), VL.end(), Candidates[Cnt]);
    if (NumOps != ReducedValsToOps.find(Candidates[Cnt])->second.size())
      LocalExternallyUsedValues[Candidates[Cnt]];
  }


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114171/new/

https://reviews.llvm.org/D114171



More information about the llvm-commits mailing list