[PATCH] D114171: [SLP]Improve reductions analysis and emission, part 1.
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 20 17:28:01 PDT 2022
ABataev added a comment.
In D114171#3528903 <https://reviews.llvm.org/D114171#3528903>, @alexfh wrote:
> In D114171#3528894 <https://reviews.llvm.org/D114171#3528894>, @ABataev wrote:
>
>> In D114171#3528866 <https://reviews.llvm.org/D114171#3528866>, @alexfh wrote:
>>
>>> In D114171#3528093 <https://reviews.llvm.org/D114171#3528093>, @ABataev wrote:
>>>
>>>> Aha, I committed 4e271fc49517362a9333371fb1ab7e865d4c1b0e <https://reviews.llvm.org/rG4e271fc49517362a9333371fb1ab7e865d4c1b0e> earlier today, which should improve it. Try to update the compiler.
>>>
>>> Thanks! This patch makes SLP vectorizer pass much faster on the problematic input, but it doesn't completely compensate the slowdown introduced here. This is how -ftime-report looks like after 4e271fc49517362a9333371fb1ab7e865d4c1b0e <https://reviews.llvm.org/rG4e271fc49517362a9333371fb1ab7e865d4c1b0e>:
>>>
>>> ===-------------------------------------------------------------------------===
>>> ... Pass execution timing report ...
>>> ===-------------------------------------------------------------------------===
>>> Total Execution Time: 308.9466 seconds (308.9519 wall clock)
>>>
>>> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
>>> 132.7263 ( 45.9%) 0.0001 ( 0.0%) 132.7264 ( 43.0%) 132.7351 ( 43.0%) SLPVectorizerPass
>>> 48.0866 ( 16.6%) 5.7199 ( 29.3%) 53.8065 ( 17.4%) 53.8123 ( 17.4%) ModuleInlinerWrapperPass
>>> 47.3402 ( 16.4%) 5.4604 ( 27.9%) 52.8007 ( 17.1%) 52.8060 ( 17.1%) DevirtSCCRepeatedPass
>>> 22.2568 ( 7.7%) 0.3222 ( 1.6%) 22.5790 ( 7.3%) 22.5785 ( 7.3%) GVNPass
>>> 11.3194 ( 3.9%) 1.0507 ( 5.4%) 12.3701 ( 4.0%) 12.3520 ( 4.0%) InstCombinePass
>>> 4.8834 ( 1.7%) 1.2548 ( 6.4%) 6.1382 ( 2.0%) 6.1350 ( 2.0%) InlinerPass
>>>
>>> And this is how it looked at 38d0df557706940af5d7110bdf662590449f8a60 <https://reviews.llvm.org/rG38d0df557706940af5d7110bdf662590449f8a60> (the closest commit before 7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd <https://reviews.llvm.org/rG7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd> where I could compile clang):
>>>
>>> ===-------------------------------------------------------------------------===
>>> ... Pass execution timing report ...
>>> ===-------------------------------------------------------------------------===
>>> Total Execution Time: 181.4693 seconds (181.4723 wall clock)
>>>
>>> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
>>> 48.1675 ( 29.7%) 5.6593 ( 29.3%) 53.8269 ( 29.7%) 53.8332 ( 29.7%) ModuleInlinerWrapperPass
>>> 47.4270 ( 29.2%) 5.3983 ( 28.0%) 52.8253 ( 29.1%) 52.8315 ( 29.1%) DevirtSCCRepeatedPass
>>> 22.1909 ( 13.7%) 0.2916 ( 1.5%) 22.4824 ( 12.4%) 22.4825 ( 12.4%) GVNPass
>>> 11.1989 ( 6.9%) 1.0292 ( 5.3%) 12.2281 ( 6.7%) 12.2204 ( 6.7%) InstCombinePass
>>> 4.9943 ( 3.1%) 1.1928 ( 6.2%) 6.1871 ( 3.4%) 6.1842 ( 3.4%) InlinerPass
>>> 5.2197 ( 3.2%) 0.0000 ( 0.0%) 5.2198 ( 2.9%) 5.2201 ( 2.9%) SLPVectorizerPass
>>>
>>> Note the 5s -> 130s jump in time spent in SLPVectorizerPass. I'll grab the profile for the updated binary, but it will take some time. And yes, still trying to reduce the test case.
>>
>> The perf profile should help, thanks
>
> Looking at this I wonder whether SmallPtrSet is not that small? :)
>
> - 86.47% 0.00% clang clang [.] cc1_main ▒
> - cc1_main ▒
> - 86.47% clang::ExecuteCompilerInvocation ▒
> - 86.47% clang::CompilerInstance::ExecuteAction ▒
> - 86.47% clang::FrontendAction::Execute ▒
> - 86.47% clang::ParseAST ▒
> - 80.62% clang::BackendConsumer::HandleTranslationUnit ▒
> - 80.17% clang::EmitBackendOutput ▒
> - 63.55% (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline ▒
> - 63.55% llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run ▒
> - 46.20% llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run ▒
> - 46.17% llvm::ModuleToFunctionPassAdaptor::run ▒
> - 45.90% llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run ▒
> - 45.89% llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run ▒
> - 43.23% llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run ▒
> llvm::SLPVectorizerPass::run ▒
> - llvm::SLPVectorizerPass::runImpl ▒
> - 43.20% llvm::SLPVectorizerPass::vectorizeChainsInBlock ▒
> - 42.84% llvm::SLPVectorizerPass::vectorizeSimpleInstructions ▒
> - 41.02% llvm::SLPVectorizerPass::vectorizeRootInstruction ▒
> - 40.93% (anonymous namespace)::HorizontalReduction::tryToReduce ▒
> - 18.19% llvm::slpvectorizer::BoUpSLP::buildTree ▒
> 9.89% llvm::SmallPtrSetImplBase::insert_imp_big ▒
> - 6.24% llvm::slpvectorizer::BoUpSLP::buildTree_rec ▒
> - 3.23% llvm::slpvectorizer::BoUpSLP::buildTree_rec(llvm::ArrayRef<llvm::Value*>, unsigned int, llvm::slpvectorizer::BoUpSLP::EdgeInfo const&)::$_32::operator() ▒
> + 1.61% llvm::DenseMapBase<llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int> >, llvm::V▒
> 0.85% getSameOpcode ▒
> - 0.78% llvm::slpvectorizer::BoUpSLP::newTreeEntry ▒
> 0.72% llvm::SmallPtrSetImplBase::insert_imp_big ▒
> 0.98% llvm::SmallPtrSetImplBase::FindBucketFor ▒
> 8.57% llvm::SmallPtrSetImplBase::FindBucketFor ▒
> + 4.71% llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseM▒
> 3.23% (anonymous namespace)::HorizontalReduction::tryToReduce(llvm::slpvectorizer::BoUpSLP&, llvm::TargetTransformInfo*)::{lambda(bool)#1}::operator() ▒
> 0.70% memset ▒
> + 1.82% tryToVectorizeSequence<llvm::Value> ▒
Ok, thanks, will improve it on Monday. We can avoid multiple creation of SmallPtrSet and I'll check for other possible optimizations too.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D114171/new/
https://reviews.llvm.org/D114171
More information about the llvm-commits
mailing list