[PATCH] D107529: [llvm-profgen] Fix bug of loop scope mismatch

Lei Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 5 16:27:29 PDT 2021


wlei added inline comments.


================
Comment at: llvm/tools/llvm-profgen/ProfileGenerator.cpp:523
   extractProbesFromRange(RangeCounter, ProbeCounter, Binary);
   std::unordered_map<MCDecodedPseudoProbeInlineTree *, FunctionSamples *>
       FrameSamples;
----------------
hoy wrote:
> hoy wrote:
> > wlei wrote:
> > > hoy wrote:
> > > > wlei wrote:
> > > > > hoy wrote:
> > > > > > wlei wrote:
> > > > > > > hoy wrote:
> > > > > > > > wlei wrote:
> > > > > > > > > wlei wrote:
> > > > > > > > > > wlei wrote:
> > > > > > > > > > > hoy wrote:
> > > > > > > > > > > > wlei wrote:
> > > > > > > > > > > > > I'm wondering here if `FunctionSamples *` should be `Vector<FunctionSamples *>` as same frame can point to different  FunctionSamples?
> > > > > > > > > > > > An inline context under a given calling context should uniquely identify an inline frame, therefore can only point to one leaf profile. Am I missing anything?
> > > > > > > > > > > Sorry I meant `std::set<FunctionSamples *>`.
> > > > > > > > > > > 
> > > > > > > > > > > I'm thinking the multiple probes cases from different call stack and they have the same leaf probe. like
> > > > > > > > > > > [Probe1, Probe3] and [Probe1, Probe2]
> > > > > > > > > > > 
> > > > > > > > > > > Here Probe1 will have two `FunctionSamples`.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > Oh. Here in `populateBodySamplesWithProbes` has already guaranteed to have unique call stack, so no multiple probe cases. Sorry for the confusing.
> > > > > > > > > @hoy I found that the `probe->getInlineTreeNode()` is not the leaf frame so here it's not unique, the unique one should be like "probe + leaf function" and the leaf function is given from its FuncDesc. It happened that there are different the leaf functions.
> > > > > > > > > 
> > > > > > > > > See the reference below, "Note that the context from probe doesn't include leaf frame, hence we need to retrieve and prepend leaf if requested.", when we extract the inline context, we still need to append the leaf frame.
> > > > > > > > > ```
> > > > > > > > > void MCPseudoProbeDecoder::getInlineContextForProbe(
> > > > > > > > >     const MCDecodedPseudoProbe *Probe,
> > > > > > > > >     SmallVectorImpl<std::string> &InlineContextStack, bool IncludeLeaf) const {
> > > > > > > > >   Probe->getInlineContext(InlineContextStack, GUID2FuncDescMap, true);
> > > > > > > > >   if (!IncludeLeaf)
> > > > > > > > >     return;
> > > > > > > > >   // Note that the context from probe doesn't include leaf frame,
> > > > > > > > >   // hence we need to retrieve and prepend leaf if requested.
> > > > > > > > >   const auto *FuncDesc = getFuncDescForGUID(Probe->getGuid());
> > > > > > > > >   InlineContextStack.emplace_back(FuncDesc->FuncName + ":" +
> > > > > > > > >                                   Twine(Probe->getIndex()).str());
> > > > > > > > > }
> > > > > > > > > ``` 
> > > > > > > > > 
> > > > > > > > > So I changed to use `unordered_set` for multiple leaf frame FunctionSample. See if this looks good to you?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > I see, a callsite can lead to different leaf frames, if it is an indirect callsite. That's a good find, thanks! Yeah, using `unordered_set` sounds good to me. Does it close the gap you saw with the loop scope change?
> > > > > > > Yes, Now the output of this change is identical to the one without this change while building the `clang` script
> > > > > > That's nice. Wondering what could cause same inline context to point to different frames. For the clang pass1 build, ICP shouldn't be triggered? 
> > > > > My understanding is the same frame can point to different inline contexts, same inline context should point to same frame. e.g. probe1(GUID:foo) and probe2(GUID:bar) can point to the same frame(`inlinetree`) and when the `inlinetree`'s context is extracted like `main @ goo` and  there full context will be
> > > > > probe1 : `main @ goo @ foo`
> > > > > probe2: `main @ goo @ bar`.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > In your example, does `foo` and `bar` share the same callsite in `goo`? 
> > > Yes, that's why here changed to unordered_set, and as you mentioned, foo and bar is indirect call
> > I was wondering for clang pass1 build, how was the indirect call promoted. We don't use profile for pass1 thus ICP should not be triggered?
> I guess two probes with inline contexts can be merged, and the merged inline context is trimmed to empty of a common part of the inputs. Anyway, that's unrelated to this patch. Using an unordered_set sounds good to me. 
Thanks for the discussion! That's interesting question, I'm trying to find something in the `clang` asm, but it's too large to debug. I will land this first and later try to run on SPEC to see if something easy to catch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107529/new/

https://reviews.llvm.org/D107529



More information about the llvm-commits mailing list