[llvm] r246406 - [dsymutil] Fix handling of inlined_subprogram low_pcs
David Blaikie via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 9 08:36:20 PDT 2015
On Wed, Sep 9, 2015 at 8:16 AM, Frédéric Riss <friss at apple.com> wrote:
> On Sep 8, 2015, at 10:09 PM, David Blaikie <dblaikie at gmail.com> wrote:
> On Tue, Sep 8, 2015 at 1:10 PM, Frédéric Riss <friss at apple.com> wrote:
>> On Sep 8, 2015, at 12:24 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> On Mon, Aug 31, 2015 at 11:10 AM, Frédéric Riss <friss at apple.com> wrote:
>>> On Aug 31, 2015, at 9:07 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>> On Mon, Aug 31, 2015 at 9:05 AM, David Blaikie <dblaikie at gmail.com>
>>>> On Sun, Aug 30, 2015 at 6:43 PM, Frederic Riss via llvm-commits <
>>>> llvm-commits at lists.llvm.org> wrote:
>>>>> Author: friss
>>>>> Date: Sun Aug 30 20:43:14 2015
>>>>> New Revision: 246406
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=246406&view=rev
>>>>> [dsymutil] Fix handling of inlined_subprogram low_pcs
>>>>> The value of an inlined subprogram low_pc attribute should not
>>>>> get relocated, but it can happen that it matches the enclosing
>>>>> function's start address and thus gets the generic treatment.
>>>>> Special case it to avoid applying the PC offset twice.
>>>> I'm a tad confused - do you store the low_pcs as offsets relative to
>>>> the function
>>> (sorry, bouncy shuttle to work & accidentally sent before I finished
>>> that sentence...)
>>> do you store the low_pcs as offsets relative to the function's low_pc?
>>> That's interesting - and perhaps something we should standardize/generalize
>>> to reduce relocations in all our DWARF output (but I don't think there's
>>> any standard for it yet in DWARF), but I'm not sure why that would require
>>> special casing the case where the two low_pcs are equal - wouldn't that
>>> just mean the low_pc of the inlined subroutine would be at zero offset from
>>> the subprogram's low_pc? (& still not relocated)
>>> dsymutil takes the debug map as input that only contains the function
>>> (and variables) start addresses. That’s the only thing we can count on
>>> being exact. We then do a pass over all the debug_info relocations to find
>>> the ones that correspond to those addresses (and the DIEs where we find the
>>> ‘interesting’ relocations are the ones that define which part of the DIE
>>> tree we keep). Then — once we decided what to keep — we go over the kept
>>> DIEs and we clone them, applying the relocations in the process. But note
>>> that the relocations we’ve chosen are only for the entry points, thus we
>>> need to have the code around to handle the
>>> lexical_block/inlined_subroutine, and this code doesn’t use the relocations
>>> (it applies an offset that we computed when handling the subprogram DIE).
>>> What happened here is that the generic code that applied the
>>> relocations would also patch the inlined_subroutine low_pc because the
>>> relocation was the same as the entry point. And then the code handling the
>>> low_pc attributes for the inlined_subroutine would apply the offset a
>>> second time.
>> OK - what I'm wondering is whether it would work better/as well to
>> generalize this code, rather than two distinct passes/processes.
>> I don’t think there’s a way to generalize this code. But I agree that
>> storing the low_/high_pcs as offsets from their enclosing function low_pc
>> would save quite a few relocations.
> Sorry, that wasn't what I was trying to describe,
> I must admit that I didn’t really get your ‘2 distinct passes/processes’
> so I replied to you original point. But now I think I see what you meant
> and I hope the rest of my answer did address that.
> but it's certainly something we've discussed before (actually I made a
> silly prototype of using dwarf expressions and debug address pool indicies
> to do reloc sharing (using one reloc per section (macho would use one reloc
> per function, due to the implied function-sections like behavior) - never
> did get around to running good numbers on it, though)).
>> Note that there is precedent for something like this: the ranges are
>> encoded as offsets from the *CU* low_pc. Maybe it would be more natural to
>> use that then?
> Note to myself: I said ‘more natural’ above, but I didn’t really mean it
> (more in the line of the standard would have been a better expression of my
> thought). I never understood why the standard used the CU low_pc as a base.
> It’s hard to use for the compiler (cf the kludge we use by setting the CU
> low_pc to 0 when we have multiple address ranges).
Do we still put the low_pc to 0 when we have DW_AT_ranges on the CU? I
guess maybe we do - been a while since I looked. (debuggers should just
have "no base" essentially, when the CU has ranges)
> Maybe I’m missing something, but the start of the function would have been
> much easier.
Yeah, I was thinking generalizing it a bit "you can use a constant address
value which will be interpreted relative to the nearest enclosing low_pc" -
so even if you have a split CU, but a contiguous subprogram, you can still
share the low_pc of your subprogram. Or if you have a split subprogram but
a contiguous CU (as in the hot/cold splitting case) you could still use
that, etc. (this could happen further into subprograms too - split CU,
split subprogram, but possibly a contiguous lexical block there, etc) -
this wouldn't entirely minimize relocations, though - if you had a split
subprogram and a similarly split lexical block - the lexical block ranges
wouldn't share the base relocs of the subprogram's ranges relocs, for
example. (or if you had a split subprogram, split CU, but contiguous
lexical block - you still wouldn't get to share whichever subprogram/cu
reloc refers to the chunk that the lexical block is in)
That's why the prototype I did was fission-based, because there's already
address pooling implemented there (& we use fission anyway, so it was in
the space I was thinking of). It'd still need some extensions for ranges,
if I recall correctly, to allow ranges to use addr+offset as well. (& I
don't really think using generalized dwarf expressions is the right
solution for the addr+offset in DWARF attributes, but it was a fun way to
> If we had a (probably/preferably compact) encoding to describe this, it
> would probably be ideal.
> DWARF4 already has this /sort/ of thing for high_pc (where it can be
> encoded as a static offset relative to the low_pc - so it's not another
> relocation). That could possibly be generalized further to allow high_pcs
> to be a static offset relative to their enclosing high_pc (if one exists,
> otherwise it would be an unacceptable encoding (this could occur for
> functions - if the CU isn't a contiguous PC range (non-CU functions in
> between CU functions, functions in other sections, etc) or if a function
> itself is discontiguous (hot/cold code splitting)).
> Eric & I have bandied that around now & then, which lead to the
> aforementioned prototype I played around with, but didn't go any further
> than that - my improvements to Clang's debug info emission had already
> brought it down to half the size of GCC's, so we didn't have any particular
> need to push further at the time.
> Interesting to know.
>> low_pc should just be a zero-offset relocation, right?
>> Not on mach-o. Most relocations will be of the form __text+offset. That’s
>> why there is no way for me to differentiate a __text+offset references the
>> end of a range from the exact same relocation that references the beginning
>> of another one (and as the linker can tear apart sections, that distinction
>> is fundamental).
> OK, so you search through looking for a subprogram that has a subprogram
> low_pc at __text+offset? then assume all the other low/high pcs (and
> ranges) are relative to that function starting point? (this is how you
> remove the ambiguity of the start/end?)
> Basically yes. It’s a bit more complicated because it’s a multi-phase
> process, but the end result is that while linking the DIEs we know if we
> are in a function and we know it’s object file and its linked address. We
> just apply that same offset to all the other object file addresses within
> that function.
OK, I'll see if I can understand this/explain myself:
It sounds like you search through for the subprogram DIE with the
appropriate low_pc matching the debug map entry you received, then you
update that low_pc, record the base offset of the subprogram and add that
to all the address attributes in the subprogram?
But you don't search for the low_pc of the subprogram, you just search for
any low_pcs - update them all, then do the addition as a second pass.
Things I'm confused by:
* Why does the second pass not touch the subprogram (how does the
subprogram's high_pc get updated? Is that a special case? Does it need to
* Why is the low_pc (or low_pcs) get updated eagerly, rather than deferring
it to be handled with the second pass/addition code? (so then it wouldn't
need a special case, with another special case on top to workaround it)
>> Maybe I'm not understanding/explaining very well, though.
>>> We might be able to completely remove any specific handling and just
>>> ‘promote’ all the relocations that fall inside a linked function as
>>> interesting. At the point we do that triaging relocs, we are not exploring
>>> the DIE tree though, just looking at the relocation list, so it would
>>> require us to trust the size field of the debug map, and I’m not sure we
>>> can do that 100% of the time (I know that this field is not accurate, it’s
>>> usually too big because it factors in alignment, but that might not be an
>>> issue if nothing gets allocated in the alignment padding).
>> Hmm - not sure I follow this. You're suggesting that if a non-debug-aware
>> tool applied the relocations in the object file/debug info, it would
>> mangle/damage the debug info?
>> Basically yes. As I explain above a relocation based off the __text
>> section with a constant offset could be replaced by different values
>> depending on the context. I already said that, but I guess the message is
>> hard to get through: dsymutil uses the object file relocations to know what
>> to link, but it doesn’t do relocation processing in the usual sense,
>> because this simply wouldn’t work (More precisely, it tries to do as much
>> standard relocation processing as possible, but it needs some code to
>> workaround the cases where that logic gives the wrong result).
> It's slowly sinking in, I appreciate your patience in (repeatedly)
> explaining it to me.
> I hope I didn’t come through as complaining about that. I was merely
> acknowledging that it’s very different from other platforms and thus hard
> to convey to people not working with that platform. I really appreciate
> your interest.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits