[PATCH] D94110: [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation
Lei Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 27 14:01:57 PST 2021
wlei added a comment.
> Thanks for the quick experiment! Given that we don't see immediate speed up from global trie, I'm inclined to just use what you have in this patch, and defer further improvement for the future. What do you think?
Yeah, sounds good to me! It didn't affect the functionality, then in the future we can try to improve it when we get to solve the sharing context issue.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:229
+ ProfiledFrame DummyTrieRoot;
+ ProfiledFrame *FrameCurrentLeaf;
// Used to fall through the LBR stack
----------------
wenlei wrote:
> FrameCurrentLeaf -> CurrentLeafFrame?
renamed!
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:484
uint64_t Repeat);
- SampleCounter &getOrCreateCounter(const ProfiledBinary *Binary,
- std::list<uint64_t> &CallStack);
- // Use pseudo probe based context key to get the sample counter
- // A context stands for a call path from 'main' to an uninlined
- // callee with all inline frames recovered on that path. The probes
- // belonging to that call path is the probes either originated from
- // the callee or from any functions inlined into the callee. Since
- // pseudo probes are organized in a tri-tree style after decoded,
- // the tree path from the tri-tree root (which is the uninlined
- // callee) to the probe node forms an inline context.
- // Here we use a list of probe(pointer) as the context key to speed up
- // aggregation and the final context string will be generate in
- // ProfileGenerator
- SampleCounter &getOrCreateCounterForProbe(const ProfiledBinary *Binary,
- std::list<uint64_t> &CallStack);
-
private:
ContextSampleCounterMap *CtxCounterMap;
----------------
wenlei wrote:
> this private can be removed since there's one on line 456.
Good catch!
================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:206
+ const uint64_t Address = 0;
+ Frame *Parent = nullptr;
+ SampleVector RangeSamples;
----------------
wenlei wrote:
> wlei wrote:
> > wenlei wrote:
> > > Raw pointer is used for parent, but unique_ptr is used for child. Would be good to keep consistent.
> > Oh, the child is a new allocated instance so we use smart pointer for it, but the parent pointer can always point to a pre-allocated instance or null(for root). Also seems it would cause a recursive deconstruction of unique_ptr error if making parent unique_ptr.
> Definitely not unique_ptr for parent and multiple children would keep pointer to same parent. I meant to say consistency between smart pointer vs raw pointer, and in the case of parent, a shared_ptr?
Just tried the share_ptr, it failed, it seems there is a cycle object free issue, something like the parent object free its children's field then it will free the Parent shared_ptr, the Parent shared_ptr again tried to free its parent object cause a cycle free failure. I code like below(use `this` to initial Parent ptr)
```
std::share_ptr<ProfiledFrame> Parent;
...
ProfiledFrame(uint64_t Addr = 0, ProfiledFrame *P = nullptr)
: Address(Addr), Parent(P) {}
ProfiledFrame *getOrCreateChildFrame(uint64_t Address) {
auto Ret = Children.emplace(
Address, std::make_unique<ProfiledFrame>(Address, this));
return Ret.first->second.get();
}
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94110/new/
https://reviews.llvm.org/D94110
More information about the llvm-commits
mailing list