[PATCH] D123271: [llvm-profgen] Filter out invalid LBR ranges.

Thu Apr 7 18:14:41 PDT 2022

wenlei added inline comments.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:304

-    if (isCallState(State)) {
+    if (isInvalidState(State)) {
+      // Skip unwinding the rest of LBR trace. Reset the stack and update the
----------------
hoy wrote:
> wenlei wrote:
> > nit: put this towards the end of the if-else sequence as it's supposed to be a corner case. 
> > 
> > ```
> > } else if (isValidState(State)) {
> >     // Unwind branches
> >     ...
> >    unwindBranch(State);
> > }
> > else {
> >     // Invalid state
> > }
> > 
> > ```
> That'd require to change `isCallState` ... to. check for the invalid flag. It only checks if the current ip corresponds to a call inst.
Yes, I think we should change all isXXState to check validness first anyways. An invalid state isn't any of the valid states.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:305-308
+      // Skip unwinding the rest of LBR trace. Reset the stack and update the
+      // state so that the rest of the trace can still be processed in a
+      // context-insensitive way, i.e, all ranges will be counted towards
+      // the root context.
----------------
hoy wrote:
> wenlei wrote:
> > > all ranges will be counted towards the root context.
> > 
> > When we skip unwinding, we still have the context carried with probe/dwarf, so it's not always going to be base context, right? 
> Yes, ranges themselves carry inline contexts. The wording may be confusing. I'm changing it to "the rest of the trace can still be processed as if they do not have a calling context."
they do not have a calling context. -> they do not have stack sample?

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:1202
         !Binary->offsetIsTransfer(EndOffset)) {
-      InstNotBoundary++;
+      InstNotBoundary += I.second;
       WarnInvalidRange(StartOffset, EndOffset, EndNotBoundaryMsg);
----------------
hoy wrote:
> wenlei wrote:
> > hoy wrote:
> > > wenlei wrote:
> > > > I think the choice to use unique ranges was intentional. What motivated this change?  
> > > Can you remind me of why it was intentional?
> > > 
> > > I think it's useful to give a number of how many samples are invalid, though some of them could be the same. This reflects the portion of samples will be dropped.
> > How many unique instance of one issue is an indicator of how wide spread such issue is. The way we report these stats are also aligned with how it was implemented - one "profiled range" is one unique range no matter that count it has. 
> > 
> > In this case, there's 5% by your counting, what's the % with original counting?
> > How many unique instance of one issue is an indicator of how wide spread such issue is. The way we report these stats are also aligned with how it was implemented - one "profiled range" is one unique range no matter that count it has. 
> 
> Not sure but feel like knowing the real number of samples to be dropped is more helpful, at least it tells how many samples we are losing in reality. For example, we aggregate samples by their look and we have ninety nine A and one B. If B is invalid, previous implementation will tell 50% of samples are invalid, while the current one tells 1% is invalid. Feel like knowing 1% is more helpful than 50%. WDYT?
> 
> > In this case, there's 5% by your counting, what's the % with original counting?
> 
> Original counting is about 4%. Not a big difference.
> 
> 
>   
Ok, makes sense. Then we need to change the message, something like " .. samples are from ranges ..", instead of " ... profiled ranges.. "

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123271/new/

https://reviews.llvm.org/D123271