[PATCH] D159460: [BOLT][NFC] Speedup YAML profile processing

Mon Sep 11 15:41:19 PDT 2023

Amir marked an inline comment as done.
Amir added a comment.

In D159460#4643578 <https://reviews.llvm.org/D159460#4643578>, @maksfb wrote:

> Nice - good improvement. Do you know where the majority of the profile-processing time is spent with this change?

Here's what `perf report` shows when narrowed down to `YAMLProfileReader` class:

  Samples: 49M of event 'cycles', Event count (approx.): 19618

  -  100.00%              llvm-bolt
     -  100.00%              llvm-bolt
        -   49.06%              [.] llvm::bolt::YAMLProfileReader::mayHaveProfileData
           +   18.37%              [.] llvm::StringMapImpl::FindKey
           +   15.31%              [.] llvm::bolt::YAMLProfileReader::mayHaveProfileDat
           +    6.72%              [.] llvm::bolt::BinaryFunction::forEachName<llvm::bo
           +    3.94%              [.] operator delete at plt
           +    2.66%              [.] llvm::bolt::RewriteInstance::selectFunctionsToPr
           +    2.06%              [.] llvm::bolt::getLTOCommonName
           +    0.01%              [k] asm_sysvec_apic_timer_interrupt
        +   25.63%              [.] llvm::bolt::YAMLProfileReader::parseFunctionProfile
        +   22.19%              [.] llvm::bolt::YAMLProfileReader::buildNameMaps
        +    1.56%              [.] llvm::bolt::YAMLProfileReader::readProfile
        +    0.68%              [.] llvm::bolt::YAMLProfileReader::matchProfileToFunction
        +    0.32%              [.] llvm::bolt::YAMLProfileReader::parseFunctionProfile(llvm::bolt::BinaryFunction&
        +    0.32%              [.] llvm::bolt::YAMLProfileReader::~YAMLProfileReader
        +    0.24%              [.] llvm::bolt::YAMLProfileReader::preprocessProfile

Quick analysis:

- `mayHaveProfileData` is not called from profile {pre-,}processing. It's only called from RI::selectFunctionsToProcess.
- `parseFunctionProfile` is invoked once per profile function and does the heavy lifting attaching the profile to CFG.
- `buildNameMaps` loops over profile functions and binary functions, hence the cost. I tried to reduce overhead per summary.
- `readProfile` actually reads YAML profile, but the bulk of overhead is in `llvm::yaml` code - see below.

llvm::yaml methods:

  Samples: 49M of event 'cycles', Event count (approx.): 298169

  -  100.00%              llvm-bolt
     -  100.00%              llvm-bolt
        +   15.42%              [.] llvm::yaml::Scanner::peekNext
        +    8.81%              [.] llvm::yaml::Scanner::scanPlainScalar
        +    8.18%              [.] llvm::yaml::Scanner::removeStaleSimpleKeyCandidates
        +    7.88%              [.] llvm::yaml::Scanner::fetchMoreTokens
        +    4.94%              [.] llvm::yaml::Document::parseBlockNode
        +    4.86%              [.] llvm::yaml::Scanner::getNext
        +    4.53%              [.] llvm::yaml::Scanner::scanToNextToken
        +    4.49%              [.] llvm::yaml::Input::createHNodes

As you can see `createHNodes` is no longer the most expensive part. I don't see easy optimization opportunities here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159460/new/

https://reviews.llvm.org/D159460