[PATCH] D96434: [CSSPGO][llvm-profgen] Filter out the instructions without location info for symbolizer

Lei Wang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 10 16:41:18 PST 2021


wlei added a comment.

In D96434#2555468 <https://reviews.llvm.org/D96434#2555468>, @wenlei wrote:

>> symbolizer will return an empty call stack for them which will cause some crash later in profile unwinding.
>
> Can you share more details on what which part had trouble dealing with such cases?

Sure, it's the case with the `inlineContextEqual`(as we are discussing in the comments below), if the `locationStack` is empty, the `equal` function will crash. After I filter out all the instruction without location, then I got the out-of-boundary crash. so that's why I also need to fix `advance` and `backward`.

In our internal prototype, it gives all the instruction without location info a fake location, like <function_name, -1>, -1 is the location line number.
here for profgen's symbolizer, it only gives an empty stack.



================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:60
       uint64_t PrevIP = IP.Address;
-      IP.backward();
+      HasNext = IP.backward();
       // Break into segments for implicit call/return due to inlining
----------------
hoy wrote:
> wenlei wrote:
> > Since all code addresses for the entire binary is record in a single vector, this `HasNext` check is only going to help for the first/last instruction of the entire binary. 
> > 
> > Is the problem you're seeing very specific to first/last instruction of the entire binary? I thought instruction without location could happen anywhere..
> Yeah, empty inline context should not be uncommon. Is the case where `IP.backward()` resulted in an invalid IP that crashed the `inlineContextEqual` check?
> this HasNext check is only going to help for the first/last instruction of the entire binary.
>IP.backward() resulted in an invalid IP that crashed the inlineContextEqual check
Exactly, you both are right.

> Is the problem you're seeing very specific to first/last instruction of the entire binary? I thought instruction without location could happen anywhere..
Yes, it could happen anywhere. if the instruction without location is not the first/ last instruction, it will jump to the next valid address which also get `false` for `inlineContextEqual`, it's fine for that. Only the first/last one, it will be out of boundary.

For our internal tool, it use a fake location, like <function_name, -1> which can ensure `inlineContextEqual` always return false. and I also saw the first/last instruction is always a fake location, so there is no out of boundary issue for it.

An alternative to fix this is to do the same as our internal one, for all the empty location stack, we create a fake location, say <invalid, -1>, then I guess we don't need to check the boundary for `advance` and `backward` (theoretically we should also check but we didn't see the case first/last insn isn't a fake location for our internal tool). One trade-off is it could sightly increase the memory usage. 

Any thoughts on this?









================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.cpp:431
+    return false;
   Index++;
   Address = Binary->getAddressforIndex(Index);
----------------
wenlei wrote:
> What about still advancing the index before returning false? I.e. advancing IP past last instruction should leave IP in invalid state - this might help expose issues where return value from advance is not checked, otherwise we might be slightly accessing last instruction repeatedly. We could also set address to UINT_MAX for such case, so extra check may be avoided. Same for backward, but UINT_MIN for address.
Good suggestion! met the repeating instruction issue..

Will make the out of boundary address to UINT_MIN/UINT_MAX.


================
Comment at: llvm/tools/llvm-profgen/ProfiledBinary.h:182
   }
+  uint32_t codeAddrsSize() { return CodeAddrs.size(); }
   StringRef getPath() const { return Path; }
----------------
wenlei wrote:
> nit: `getNumInstructions` or `getNumCodeAddresses` ?
Good catch!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96434/new/

https://reviews.llvm.org/D96434



More information about the llvm-commits mailing list