[PATCH] D121179: [memprof] Store callsite metadata with memprof records.
Teresa Johnson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 9 12:14:09 PST 2022
tejohnson added inline comments.
================
Comment at: llvm/lib/ProfileData/RawMemProfReader.cpp:324
+
+ // The leaf function is the first PC in the callstack so check if unset.
+ if (LeafFunction == 0) {
----------------
snehasish wrote:
> tejohnson wrote:
> > Why can't the leaf function be an inline frame? I.e. what if we have:
> >
> > foo() { malloc(); }
> > bar() { foo(); }
> >
> > and we inline foo into bar.
> >
> > Won't the leaf be foo which will be an inline frame?
> My usage of the term "leaf" is the cause of confusion, here I identify "last non-inlined" function in the callstack as leaf. Since we will annotate the profile post-inlining, this is the function where the records will be placed. In the example above, bar will have the memprof record entry. Let me know how you want me to reword this to make it clearer.
>
> Also a note about ordering of entries in the callstack which represented by a list of MemProfRecord::Frames. For example
>
> qux() { malloc(); }
> inline foo() { qux(); }
> bar() { foo(); }
>
> is represented as:
>
> bar (isInline: false); foo (isInline: true); qux (isInline:false)
>
> as opposed to the expectation:
>
> foo (isInline: true); bar (isInline: false); qux (isInline:false)
>
> Based on your comment, this may be surprising since foo is the "leaf" function. I chose this representation to have deterministic access to the function id where the memprof record is to be applied, i.e. it is always at index 0. However, now that we have a map of function ids to memprof records in the reader (previously we just had a list of memprof records) we can change the order of to be more intuitive (i.e. on L339 we will append the frames in reverse). Wdyt?
>
>
>
> My usage of the term "leaf" is the cause of confusion, here I identify "last non-inlined" function in the callstack as leaf. Since we will annotate the profile post-inlining, this is the function where the records will be placed. In the example above, bar will have the memprof record entry. Let me know how you want me to reword this to make it clearer.
An issue with this is that foo may have been inlined into bar before or after the point of matching. I.e. the inline frames in the profile are those inlined anywhere in the compiler: either in early inlining (which is done before typical PGO matching), or in the usual inline optimization pass (or in the rarer but supported case of an instrumented build with ThinLTO, in the LTO backend's invocation of the inliner).
So in my example, when we match we may or may not have inlined foo into bar at that point. Which raises the question of which function(s) should get the memprof data in the profile. I would argue that at the very least foo() should, even if it is marked as an inline frame (because when we match it may not yet have been inlined, and in fact based on profile info we may or may not make the same inlining decisions later in the compile either). Possibly it should also be on the non-inlined caller (and any other intervening inline frames), to make matching easier, in the case where the inline was done by the early inliner before matching.
Actually, just realized this same question affects where we attach the callsites profiles below given a set of inline frames.
>
> Also a note about ordering of entries in the callstack which represented by a list of MemProfRecord::Frames. For example
>
> qux() { malloc(); }
> inline foo() { qux(); }
> bar() { foo(); }
>
> is represented as:
>
> bar (isInline: false); foo (isInline: true); qux (isInline:false)
This makes sense to me since bar calls foo calls qux. Also, just a note that the isInline here assumes we only inlined foo into bar (which is suggested but not technically guaranteed by the inline keyword, and also we could have additional inlining by the optimizer).
>
> as opposed to the expectation:
>
> foo (isInline: true); bar (isInline: false); qux (isInline:false)
I'm not sure why this one would be the expectation? It is neither bottom up nor top down.
>
> Based on your comment, this may be surprising since foo is the "leaf" function. I chose this representation to have deterministic access to the function id where the memprof record is to be applied, i.e. it is always at index 0. However, now that we have a map of function ids to memprof records in the reader (previously we just had a list of memprof records) we can change the order of to be more intuitive (i.e. on L339 we will append the frames in reverse). Wdyt?
>
>
>
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D121179/new/
https://reviews.llvm.org/D121179
More information about the llvm-commits
mailing list