[PATCH] D159460: [BOLT][NFC] Speedup YAML profile processing
Amir Ayupov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 11 15:41:19 PDT 2023
Amir marked an inline comment as done.
Amir added a comment.
In D159460#4643578 <https://reviews.llvm.org/D159460#4643578>, @maksfb wrote:
> Nice - good improvement. Do you know where the majority of the profile-processing time is spent with this change?
Here's what `perf report` shows when narrowed down to `YAMLProfileReader` class:
Samples: 49M of event 'cycles', Event count (approx.): 19618
- 100.00% llvm-bolt
- 100.00% llvm-bolt
- 49.06% [.] llvm::bolt::YAMLProfileReader::mayHaveProfileData
+ 18.37% [.] llvm::StringMapImpl::FindKey
+ 15.31% [.] llvm::bolt::YAMLProfileReader::mayHaveProfileDat
+ 6.72% [.] llvm::bolt::BinaryFunction::forEachName<llvm::bo
+ 3.94% [.] operator delete at plt
+ 2.66% [.] llvm::bolt::RewriteInstance::selectFunctionsToPr
+ 2.06% [.] llvm::bolt::getLTOCommonName
+ 0.01% [k] asm_sysvec_apic_timer_interrupt
+ 25.63% [.] llvm::bolt::YAMLProfileReader::parseFunctionProfile
+ 22.19% [.] llvm::bolt::YAMLProfileReader::buildNameMaps
+ 1.56% [.] llvm::bolt::YAMLProfileReader::readProfile
+ 0.68% [.] llvm::bolt::YAMLProfileReader::matchProfileToFunction
+ 0.32% [.] llvm::bolt::YAMLProfileReader::parseFunctionProfile(llvm::bolt::BinaryFunction&
+ 0.32% [.] llvm::bolt::YAMLProfileReader::~YAMLProfileReader
+ 0.24% [.] llvm::bolt::YAMLProfileReader::preprocessProfile
Quick analysis:
- `mayHaveProfileData` is not called from profile {pre-,}processing. It's only called from RI::selectFunctionsToProcess.
- `parseFunctionProfile` is invoked once per profile function and does the heavy lifting attaching the profile to CFG.
- `buildNameMaps` loops over profile functions and binary functions, hence the cost. I tried to reduce overhead per summary.
- `readProfile` actually reads YAML profile, but the bulk of overhead is in `llvm::yaml` code - see below.
llvm::yaml methods:
Samples: 49M of event 'cycles', Event count (approx.): 298169
- 100.00% llvm-bolt
- 100.00% llvm-bolt
+ 15.42% [.] llvm::yaml::Scanner::peekNext
+ 8.81% [.] llvm::yaml::Scanner::scanPlainScalar
+ 8.18% [.] llvm::yaml::Scanner::removeStaleSimpleKeyCandidates
+ 7.88% [.] llvm::yaml::Scanner::fetchMoreTokens
+ 4.94% [.] llvm::yaml::Document::parseBlockNode
+ 4.86% [.] llvm::yaml::Scanner::getNext
+ 4.53% [.] llvm::yaml::Scanner::scanToNextToken
+ 4.49% [.] llvm::yaml::Input::createHNodes
As you can see `createHNodes` is no longer the most expensive part. I don't see easy optimization opportunities here.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D159460/new/
https://reviews.llvm.org/D159460
More information about the llvm-commits
mailing list