[PATCH] D123271: [llvm-profgen] Filter out invalid LBR ranges.

Thu Apr 7 11:09:03 PDT 2022

hoy added inline comments.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:286-287

+  // Skip unwinding the rest of LBR trace when a bogus range is seen.
+  bool SkipUnwinding = false;
+
----------------
wenlei wrote:
> Is it critical to skip unwinding instead of ignore the entire record for invalid range? If not, perhaps it's better to keep it simple and just ignore such record. I imagine such invalid ranges are rare. 
Sort of. With one of internal large services we have about 5% of such invalid ranges. Ignoring the whole trace where such ranges are in may not be trivial. 

   warning: 5.13%(77654546/1512342539) of profiled ranges is bogus.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:300-301
       // break into smaller sub section each with its own calling context.
-      unwindLinear(State, Repeat);
+      if (!unwindLinear(State, Repeat))
+        SkipUnwinding = true;
     }
----------------
wenlei wrote:
> Theoretically any kind of unwind can discover bad state, so this is not unique to unwindLinear. I suggest we have an invalid state flag for UnwindState, which gets checked for all isXXState (may also added isValidState). This way we also keep the API for unwindXX consistent, and also avoid special checking after unwindLinear. 
Sounds good.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:1202
         !Binary->offsetIsTransfer(EndOffset)) {
-      InstNotBoundary++;
+      InstNotBoundary += I.second;
       WarnInvalidRange(StartOffset, EndOffset, EndNotBoundaryMsg);
----------------
wenlei wrote:
> I think the choice to use unique ranges was intentional. What motivated this change?  
Can you remind me of why it was intentional?

I think it's useful to give a number of how many samples are invalid, though some of them could be the same. This reflects the portion of samples will be dropped.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:1218-1221
+    if (StartOffset > EndOffset) {
+      BogusRange += I.second;
+      WarnInvalidRange(StartOffset, EndOffset, BogusRangeMsg);
+    }
----------------
wenlei wrote:
> Such invalid ranges doesn't seem to be recorded anyways, how do we get those ranges? 
Right, they'll be abandoned. The warning here is to tell how many such samples would be dropped so we kinda getting a sense how important the issue is.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:1230
                      "of profiled ranges do cross function boundaries.");
+  emitWarningSummary(BogusRange, TotalRangeNum, "of profiled ranges is bogus.");
 }
----------------
wenlei wrote:
> It is unclear to user what "bogus" means. And one could argue that ranges crossing function boundaries (one above) is also bogus. Suggest to be specific about what this is so user knows what this means, e.g. `... of profiled ranges have range start after range end`.
Done.

================
Comment at: llvm/tools/llvm-profgen/PerfReader.h:297
   void advanceLBR() { LBRIndex++; }
-  ProfiledFrame *getParentFrame() { return CurrentLeafFrame->Parent; }
+  ProfiledFrame *getParentFrame() {
+    return CurrentLeafFrame == &DummyTrieRoot ? CurrentLeafFrame
----------------
wenlei wrote:
> What motivated this change? Conceptually parent frame of dummy root should indeed be nullptr.
It is to handle the diversion of probe samples and dwarf samples. With probe, LBR ranges are reported towards its parent frame (calling context), while w/o probe, ranges are reported towards the leaf frame. See `VirtualUnwinder::unwindLinear`. When unwinding is at a bad state, the call stack is reset to `DummyTrieRoot`, thus leading to a null parent technically for the probe case which will cause AV here and there.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123271/new/

https://reviews.llvm.org/D123271