[PATCH] D104546: [CSSPGO][llvm-profgen] Handle return to external transition.
    Wei Mi via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Tue Jun 22 14:33:53 PDT 2021
    
    
  
wmi accepted this revision.
wmi added inline comments.
================
Comment at: llvm/tools/llvm-profgen/PerfReader.cpp:464
+
+      if (Binary->addressIsReturn(Src)) {
+        // In a callback case, a return from internal code, say A, to external
----------------
wenlei wrote:
> wmi wrote:
> > hoy wrote:
> > > wmi wrote:
> > > > wmi wrote:
> > > > > wenlei wrote:
> > > > > > hoy wrote:
> > > > > > > wenlei wrote:
> > > > > > > > hoy wrote:
> > > > > > > > > hoy wrote:
> > > > > > > > > > wenlei wrote:
> > > > > > > > > > > If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right?
> > > > > > > > > > > 
> > > > > > > > > > > So IIUC, the bail out condition should be A and B are not the same frame/function? In order for artificial branch to work with unwinder, the original execution flow can't involve extra external frames. 
> > > > > > > > > > The processing order here is in reversed timing order. E..g, we are processing a trace from latest to oldest like:
> > > > > > > > > > 
> > > > > > > > > > [C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], and now we are at [A, ext], without this change we would group [ext, B] (cached by `PrevTrDst`) and [A, ext] to form an artificial branch. With this change, we only keep processed traces up to [E, F].
> > > > > > > > > > 
> > > > > > > > > > Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that.
> > > > > > > > > > If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right?
> > > > > > > > > 
> > > > > > > > > This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case.
> > > > > > > > > Checking A and B are not in the same function can narrow down the cases. But even if they are in the same function, returning from A to runtime and then form to A ends up with an artificial branch meaning A returning to A can still be confusing, since there may not be a callsite corresponding to that.
> > > > > > > > 
> > > > > > > > Fair point. 
> > > > > > > > 
> > > > > > > > > > If we have a transition from A to B, which is in the form of A calling <external> and <external> calling B, we would form artificial branch from A to B as well. But the same issue with unwinder still exists, right?
> > > > > > > > > 
> > > > > > > > > This is fine so far. The problem with returning from A to B is that it causes the unwinder to treat the instruction in front of B is a call instruction, which is not always the case.
> > > > > > > > 
> > > > > > > > For this case, we don't have the problem of expecting a non-existing call site, but I was wondering if unwinder can yield correct results. When unwinding through artificial branch A->B, we see it's a call and will pop the frame of B, however in reality, we should pop frames for both B and <external>? 
> > > > > > > If there is a preceding return from B to somewhere which pushes a frame, then the artificial call from A to B should naturally pop it. This likely means calling into the runtime from A, and then calling back to user code B, and then returning to user code, which is likely to happen due to tail calls.
> > > > > > > 
> > > > > > > If there is no preceding return in LBR, the call stack should be empty since the call stack trace truncation happens at external calling B point. The leftover traces will be treated without a calling context.
> > > > > > ok, forgot about the truncation, A->external->B will never exist for calling context.
> > > > > Didn't follow the related change closely so I have some basic questions. 
> > > > > 
> > > > > When we need artificial branch?
> > > > > 
> > > > > In which case the truncation will happen?
> > > > 
> > > > 
> > > > 
> > > An artificial branch is used to model a round-trip transition from user code to runtime and back. For example, for a LBR trace like `[C, D] [E, F] [ext, B] [ext, ext] .... , [A, ext], [M,N], [P,Q]`, an artificial branch `[A, B]` is created to connect the trace before and after the user/external transition, so that the original LBR trace is made use of maximally.
> > > 
> > > Here by truncation I mean the callstack trace truncation. A hybrid trace may look like
> > > 
> > > 	          40057d
> > > 	          4005b9
> > > 	    7f67469af555
> > >                   400579
> > >  0x40059f/0x400553/P/-/-/0  0x40059f/0x400553/P/-/-/0  0x40059f/0x400553/P/-/-/0  **0x7f67469af555/0x4005b98/**P/-/-/0 **0x400578/0x7f67469af530**/P/-/-/0   0x40099f/0x400588/P/-/-/0
> > > 
> > > Note that the return address (0x7f67469af555) sitting in the middle of the call stack indicates an external call back to user code. The callstack will be truncated to just the upper two entries since there isn't a good way to use a calling context with external code in the compiler. However, we don't truncate the LBR traces at the point of external calls. We'd like to fully use the trace before and after the transition, thought they may only be under a partial calling context.
> > Thanks for the detailed explanation. 
> > 
> > For artificial branch, sounds like you drop the parts belonging to external. I can understand for the branch records which are not calls/rets, it is ok to drop them since it is unrelated to the current binary. But if the branch records are calls/rets, they may correlate with the frames in the unwinding stack. I guess that is why you also have callstack trace truncation. 
> > 
> > What if we keep all the calls/rets related branch records no matter they are ingoing/outgoing/external and keep all the frames in the callstack, can we match them and get the correct context easier?
> External code may have frame pointer omitted, and stack sample from external code can be inaccurate. Additionally, we can't really leverage external frames in the context in the compiler for inlining.
I see. Thanks for explaining it.
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104546/new/
https://reviews.llvm.org/D104546
    
    
More information about the llvm-commits
mailing list