[PATCH] D76336: [DWARF] Emit DW_AT_call_pc for tail calls

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 19 18:40:03 PDT 2020


dblaikie added a comment.

In D76336#1932330 <https://reviews.llvm.org/D76336#1932330>, @vsk wrote:

> In D76336#1932313 <https://reviews.llvm.org/D76336#1932313>, @dblaikie wrote:
>
> > >> @dblaikie wrote:
> > >> why is this only relevant in optimized builds?
> > > 
> > > I think the main benefit of the call-site information is using it together with call-site-parameters, used for computing the actual value of parameter, even the location of the parameter was not provided (optimized-out).
> >
> > That presents some interestingly tricky challenges. We have -fstandalone-debug for if you are building one part of a program with debug info an others without - but there's nothing equivalent for if you're building part of a program optimized but other parts unoptimized. And this heuristic (only emit these attributes for optimized code) assumes the caller and callee are both equally optimized/similarly compiled - which isn't necessarily true.
>
>
> That's ok though, because a debugger can handle call site entries being only partially available. I.e. there's no reason (afaik) for mixing and matching optimized/unoptimized .o's to regress debugger features enabled by call site entries.


Ah, I wasn't suggesting it'd regress functionality - but that the heuristic of "only emit these for optimized code" was just that, a heuristic with some false negatives (or positives, or whichever way you think of it). Partly then asking the question: is there a more accurate way we could determine when to emit these attributes, rather than using a frontend is/isn't optimized heuristic.

>>> That improves debugging user experience when debugging optimized code. In addition, in the case of tail calls, the call_site debug info is used for printing artificial call frames for the tail calls (and tail calls are typical to optimized code?).
>> 
>> The tail call case is easier - since that's the caller-side. It'd probably be better to just emit that on any tail call, optimized or unoptimized code - I guess I mean, ideally the choice wouldn't be made at the frontend, but at the backend if the call ends up being a tail call.
> 
> We only want to emit call site entries at tail-calling sites when the caller has debug info, though. We do that today by relying on the DIFlagAllCallsDescribed attribute, which the frontend provides.

"when the caller has debug info" - I'm probably misunderstanding what you mean there. Of course the caller has to have debug info (a DW_TAG_subprogram for the calling function) to describe the call site, since the call site tag goes inside the caller's subprogram tag.

What I meant, not in the tail-calling case, but in other call sites, the callee might be optimized but the caller might be unoptimized, so relying on "is the caller optimized" to decide whether to describe the call site misses some cases (caller is unoptimized, callee is optimized - it'd improve the user experience if that call (to the optimized function) had a call_site with call site parameters to help debuggability of the optimized function, if I understand correctly)

> Also fwiw the artificial frames debugger feature doesn't work if only the tail-calling sites are described -- all of the calls have to be described for the debugger to reconstruct feasible paths through the call graph.
> 
>> (I'm thinking of LTO situations, attribute optnone, other things like that - the frontend doesn't really know if something is optimized or not & really you can't tell if a callee is optimized because it's in another translation unit)
> 
> Hm, oh, good point. But, 99+% of the time, isn't the workflow to compile with -flto=thin + -O{1,2,3,s,z}? We handle that fine. But I guess if you're doing `-O0 -disable-O0-optnone -flto`, you wouldn't get call site entries. Hrm. Does that come up much? I guess we could fix that by adding a frontend flag?

I was just thinking straight -O0, or __attribute__((optnone)) - that could produce the "caller is unoptimized but callee is optimized, so the absence of a call_site TAG is a failure of the call_site TAG heuristic (a false negative)". Not the end of the world, and I'm not sure there's a better solution than the heuristic you've got, but just articulating a problem there. How much that comes up? well, as much as optnone/O0-preserved-through-LTO comes up, which was originally implemented for Sony - apparently their users use this as a debugging technique to maintain performance of the rest of the program while making parts of it more debuggable by usincg -O0.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76336/new/

https://reviews.llvm.org/D76336





More information about the llvm-commits mailing list