[llvm] r246406 - [dsymutil] Fix handling of inlined_subprogram low_pcs
David Blaikie via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 14 13:38:06 PDT 2015
On Thu, Sep 10, 2015 at 11:23 PM, Frédéric Riss <friss at apple.com> wrote:
>
> On Sep 10, 2015, at 10:40 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, Sep 10, 2015 at 10:24 PM, Frédéric Riss <friss at apple.com> wrote:
>
>>
>> On Sep 10, 2015, at 10:01 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Thu, Sep 10, 2015 at 9:46 PM, Frédéric Riss <friss at apple.com> wrote:
>>
>>> Sorry for the delay in replying, I got caught in other stuff.
>>>
>>> On Sep 9, 2015, at 2:39 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, Sep 9, 2015 at 2:23 PM, Frédéric Riss <friss at apple.com> wrote:
>>>
>>>>
>>>> On Sep 9, 2015, at 10:40 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Sep 9, 2015 at 9:52 AM, Frédéric Riss <friss at apple.com> wrote:
>>>>
>>>>>
>>>>> On Sep 9, 2015, at 8:36 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 9, 2015 at 8:16 AM, Frédéric Riss <friss at apple.com> wrote:
>>>>>
>>>>>>
>>>>>> On Sep 8, 2015, at 10:09 PM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 8, 2015 at 1:10 PM, Frédéric Riss <friss at apple.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Sep 8, 2015, at 12:24 PM, David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 31, 2015 at 11:10 AM, Frédéric Riss <friss at apple.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Aug 31, 2015, at 9:07 AM, David Blaikie <dblaikie at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 31, 2015 at 9:05 AM, David Blaikie <dblaikie at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 30, 2015 at 6:43 PM, Frederic Riss via llvm-commits <
>>>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>> Author: friss
>>>>>>>>>> Date: Sun Aug 30 20:43:14 2015
>>>>>>>>>> New Revision: 246406
>>>>>>>>>>
>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=246406&view=rev
>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject-3Frev-3D246406-26view-3Drev&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=_sX2G1Du1KZyzi5BD4_ddw&m=FkrVlXa3-EdUHEUklJrpTIxLR2zDdr3ysgnj0hyNiNc&s=-yp_b9w-sonxhFICg6npPkz6_FLOw29qR_X8EIzjwWY&e=>
>>>>>>>>>> Log:
>>>>>>>>>> [dsymutil] Fix handling of inlined_subprogram low_pcs
>>>>>>>>>>
>>>>>>>>>> The value of an inlined subprogram low_pc attribute should not
>>>>>>>>>> get relocated, but it can happen that it matches the enclosing
>>>>>>>>>> function's start address and thus gets the generic treatment.
>>>>>>>>>> Special case it to avoid applying the PC offset twice.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm a tad confused - do you store the low_pcs as offsets relative
>>>>>>>>> to the function
>>>>>>>>>
>>>>>>>>
>>>>>>>> (sorry, bouncy shuttle to work & accidentally sent before I
>>>>>>>> finished that sentence...)
>>>>>>>>
>>>>>>>> do you store the low_pcs as offsets relative to the function's
>>>>>>>> low_pc? That's interesting - and perhaps something we should
>>>>>>>> standardize/generalize to reduce relocations in all our DWARF output (but I
>>>>>>>> don't think there's any standard for it yet in DWARF), but I'm not sure why
>>>>>>>> that would require special casing the case where the two low_pcs are equal
>>>>>>>> - wouldn't that just mean the low_pc of the inlined subroutine would be at
>>>>>>>> zero offset from the subprogram's low_pc? (& still not relocated)
>>>>>>>>
>>>>>>>>
>>>>>>>> dsymutil takes the debug map as input that only contains the
>>>>>>>> function (and variables) start addresses. That’s the only thing we can
>>>>>>>> count on being exact. We then do a pass over all the debug_info relocations
>>>>>>>> to find the ones that correspond to those addresses (and the DIEs where we
>>>>>>>> find the ‘interesting’ relocations are the ones that define which part of
>>>>>>>> the DIE tree we keep). Then — once we decided what to keep — we go over the
>>>>>>>> kept DIEs and we clone them, applying the relocations in the process. But
>>>>>>>> note that the relocations we’ve chosen are only for the entry points, thus
>>>>>>>> we need to have the code around to handle the
>>>>>>>> lexical_block/inlined_subroutine, and this code doesn’t use the relocations
>>>>>>>> (it applies an offset that we computed when handling the subprogram DIE).
>>>>>>>>
>>>>>>>> What happened here is that the generic code that applied the
>>>>>>>> relocations would also patch the inlined_subroutine low_pc because the
>>>>>>>> relocation was the same as the entry point. And then the code handling the
>>>>>>>> low_pc attributes for the inlined_subroutine would apply the offset a
>>>>>>>> second time.
>>>>>>>>
>>>>>>>
>>>>>>> OK - what I'm wondering is whether it would work better/as well to
>>>>>>> generalize this code, rather than two distinct passes/processes.
>>>>>>>
>>>>>>>
>>>>>>> I don’t think there’s a way to generalize this code. But I agree
>>>>>>> that storing the low_/high_pcs as offsets from their enclosing function
>>>>>>> low_pc would save quite a few relocations.
>>>>>>>
>>>>>>
>>>>>> Sorry, that wasn't what I was trying to describe,
>>>>>>
>>>>>>
>>>>>> I must admit that I didn’t really get your ‘2 distinct
>>>>>> passes/processes’ so I replied to you original point. But now I think I see
>>>>>> what you meant and I hope the rest of my answer did address that.
>>>>>>
>>>>>> but it's certainly something we've discussed before (actually I made
>>>>>> a silly prototype of using dwarf expressions and debug address pool
>>>>>> indicies to do reloc sharing (using one reloc per section (macho would use
>>>>>> one reloc per function, due to the implied function-sections like behavior)
>>>>>> - never did get around to running good numbers on it, though)).
>>>>>>
>>>>>>
>>>>>>> Note that there is precedent for something like this: the ranges are
>>>>>>> encoded as offsets from the *CU* low_pc. Maybe it would be more natural to
>>>>>>> use that then?
>>>>>>>
>>>>>>
>>>>>> Note to myself: I said ‘more natural’ above, but I didn’t really mean
>>>>>> it (more in the line of the standard would have been a better expression of
>>>>>> my thought). I never understood why the standard used the CU low_pc as a
>>>>>> base. It’s hard to use for the compiler (cf the kludge we use by setting
>>>>>> the CU low_pc to 0 when we have multiple address ranges).
>>>>>>
>>>>>
>>>>> Do we still put the low_pc to 0 when we have DW_AT_ranges on the CU? I
>>>>> guess maybe we do - been a while since I looked. (debuggers should just
>>>>> have "no base" essentially, when the CU has ranges)
>>>>>
>>>>>
>>>>>> Maybe I’m missing something, but the start of the function would have
>>>>>> been much easier.
>>>>>>
>>>>>
>>>>> Yeah, I was thinking generalizing it a bit "you can use a constant
>>>>> address value which will be interpreted relative to the nearest enclosing
>>>>> low_pc" - so even if you have a split CU, but a contiguous subprogram, you
>>>>> can still share the low_pc of your subprogram. Or if you have a split
>>>>> subprogram but a contiguous CU (as in the hot/cold splitting case) you
>>>>> could still use that, etc. (this could happen further into subprograms too
>>>>> - split CU, split subprogram, but possibly a contiguous lexical block
>>>>> there, etc) - this wouldn't entirely minimize relocations, though - if you
>>>>> had a split subprogram and a similarly split lexical block - the lexical
>>>>> block ranges wouldn't share the base relocs of the subprogram's ranges
>>>>> relocs, for example. (or if you had a split subprogram, split CU, but
>>>>> contiguous lexical block - you still wouldn't get to share whichever
>>>>> subprogram/cu reloc refers to the chunk that the lexical block is in)
>>>>>
>>>>>
>>>>> Just out of curiosity: we do not do any form of function splitting (eg
>>>>> hot/cold partitioning) AFAIK, so all these ‘problems’ are mostly
>>>>> theoretical for now, right?
>>>>>
>>>>
>>>> Right - just thinking about how general of a feature we might want to
>>>> propose for DWARF standardization to address this issue.
>>>>
>>>> Currently all LLVM functions are contiguous, but CUs can be
>>>> discontiguous and lexical scopes can be discontiguous.
>>>>
>>>> So just "use the nearest parental low_pc" would still be suboptimal for
>>>> LLVM in some cases (not on MachO where function-sections-like behavior is
>>>> the only behavior so there's no cases where you share relocs between
>>>> functions) - several functions could share a reloc even if they're
>>>> discontiguous but in a single section (ie: a non-debug function may be
>>>> between two debug functions, but all in the same section together) or there
>>>> might be several contiguous functions but within a CU with a discontiguous
>>>> range (because there's a non-debug function elsewhere, or functions in
>>>> other sections (comdat, etc)).
>>>>
>>>>
>>>>>
>>>>> That's why the prototype I did was fission-based, because there's
>>>>> already address pooling implemented there (& we use fission anyway, so it
>>>>> was in the space I was thinking of). It'd still need some extensions for
>>>>> ranges, if I recall correctly, to allow ranges to use addr+offset as well.
>>>>> (& I don't really think using generalized dwarf expressions is the right
>>>>> solution for the addr+offset in DWARF attributes, but it was a fun way to
>>>>> prototype it)
>>>>>
>>>>>
>>>>>>
>>>>>> If we had a (probably/preferably compact) encoding to describe this,
>>>>>> it would probably be ideal.
>>>>>>
>>>>>> DWARF4 already has this /sort/ of thing for high_pc (where it can be
>>>>>> encoded as a static offset relative to the low_pc - so it's not another
>>>>>> relocation). That could possibly be generalized further to allow high_pcs
>>>>>> to be a static offset relative to their enclosing high_pc (if one exists,
>>>>>> otherwise it would be an unacceptable encoding (this could occur for
>>>>>> functions - if the CU isn't a contiguous PC range (non-CU functions in
>>>>>> between CU functions, functions in other sections, etc) or if a function
>>>>>> itself is discontiguous (hot/cold code splitting)).
>>>>>>
>>>>>> Eric & I have bandied that around now & then, which lead to the
>>>>>> aforementioned prototype I played around with, but didn't go any further
>>>>>> than that - my improvements to Clang's debug info emission had already
>>>>>> brought it down to half the size of GCC's, so we didn't have any particular
>>>>>> need to push further at the time.
>>>>>>
>>>>>>
>>>>>> Interesting to know.
>>>>>>
>>>>>>
>>>>>>> low_pc should just be a zero-offset relocation, right?
>>>>>>>
>>>>>>>
>>>>>>> Not on mach-o. Most relocations will be of the form __text+offset.
>>>>>>> That’s why there is no way for me to differentiate a __text+offset
>>>>>>> references the end of a range from the exact same relocation that
>>>>>>> references the beginning of another one (and as the linker can tear apart
>>>>>>> sections, that distinction is fundamental).
>>>>>>>
>>>>>>
>>>>>> OK, so you search through looking for a subprogram that has a
>>>>>> subprogram low_pc at __text+offset? then assume all the other low/high pcs
>>>>>> (and ranges) are relative to that function starting point? (this is how you
>>>>>> remove the ambiguity of the start/end?)
>>>>>>
>>>>>>
>>>>>> Basically yes. It’s a bit more complicated because it’s a multi-phase
>>>>>> process, but the end result is that while linking the DIEs we know if we
>>>>>> are in a function and we know it’s object file and its linked address. We
>>>>>> just apply that same offset to all the other object file addresses within
>>>>>> that function.
>>>>>>
>>>>>
>>>>> OK, I'll see if I can understand this/explain myself:
>>>>>
>>>>> It sounds like you search through for the subprogram DIE with the
>>>>> appropriate low_pc matching the debug map entry you received, then you
>>>>> update that low_pc, record the base offset of the subprogram and add that
>>>>> to all the address attributes in the subprogram?
>>>>>
>>>>> But you don't search for the low_pc of the subprogram, you just search
>>>>> for any low_pcs - update them all, then do the addition as a second pass.
>>>>>
>>>>>
>>>>> Not exactly (sorry I should have given a more detailed answer
>>>>> upfront). The phases I was referring to are:
>>>>> 1/ Scan the relocations (without touching the debug info contents) to
>>>>> look for interesting relics (i.e. matching something in the debug map)
>>>>> 2/ Scan the debug info a first time to match these interesting relics
>>>>> to subprograms low_pcs or variables locations (and from these ‘seeds’
>>>>> construct the full tree of DIEs to link)
>>>>> 3/ Clone/Link the selected DIEs
>>>>>
>>>>> 3/ is where everything happens. 1/ and 2/ are just here to gather
>>>>> information. During 3/ I apply the relocations *and* I use the special
>>>>> casing code to workaround the bad attribute values it would generate.
>>>>>
>>>>
>>>> And I'm trying to understand how the clone/link leads to bad values
>>>> that need to be revised/treated specially.
>>>>
>>>> The way I would imagine/picture it is that 2/ finds the interesting
>>>> low_pcs, says "this DIE subtree is interesting, we should link it" then
>>>> while linking it, anything in the subtree would be updated using the same
>>>> generic address updating algorithm relative to the "interesting address"
>>>> from the debug map. I'm not seeing where this updating algorithm would need
>>>> special cases for high or low pc at that point. Where does the special
>>>> case/problematic results come in?
>>>>
>>>> I'll try to use more words...
>>>>
>>>> so looking at a simple example of two trivial (empty) functions
>>>> targeting osx (I picked a random triple from the Clang codegen
>>>> tests: -target x86_64-apple-macosx10.7.0) I get two CUs:
>>>>
>>>> DW_TAG_subprogram [2]
>>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
>>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000006)
>>>> ...
>>>> DW_TAG_subprogram [2]
>>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000010)
>>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000016)
>>>> ...
>>>>
>>>> And, if I understand correctly there are no relocations or anything in
>>>> there - it's just literal fixed values.
>>>>
>>>>
>>>> No, those will all have relocations (but the way the relocations work
>>>> on mach-o, the content of the file where you should apply the relocation
>>>> actually corresponds to the address in the object file).
>>>>
>>>
>>> Ah, right, I was mixing up the relocations for addresses with the
>>> relocations for inter-section references within the DWARF, the latter are
>>> emitted as constants rather than relocs I think... nevermind, anyway.
>>>
>>>
>>>>
>>>> So the debug map must say things like "function at 0x0 actually ended
>>>> up at X, function at 0x10 actually ended up at Y”?
>>>>
>>>>
>>>> Yes, that’s it.
>>>>
>>>> I then picture the algorithm in 2/ identifying the first subprogram as
>>>> being at X, and the second subprogram being at Y.
>>>>
>>>>
>>>> In 2/, we just say “that subprogram has a relocation in its low_pc
>>>> attribute that corresponds to something described in the debug map, let’s
>>>> mark all its dependencies as required”.
>>>>
>>>
>>> What do you mean by "all its dependencies" in this case?
>>>
>>>
>>> The DIE dependencies is the transitive closure of the the DIEs it refers
>>> to and its parent chain.
>>>
>>>
>>>> As the DIE tree for the first subprogram is cloned, I would imagine X -
>>>> 0x0 is added to every address value. End result: low = X, high = X + 0x6
>>>>
>>>> As the DIE tree for the second subprogram is cloned, I would imagine Y
>>>> - 0x10 is added to every address value. End result: low = Y, high = Y + 0x6
>>>>
>>>> I don't see where the high or low pc special cases end up coming into
>>>> the equation. They seem to fall out, at least from the way I'm picturing it.
>>>>
>>>>
>>>> Before doing the special casing, I apply the relocations without even
>>>> looking at what attribute they correspond. So you end up with the correct
>>>> low_pcs, and then I go through the DIE and patch up the high_pc with the
>>>> right offset. In this case, everything would work on your example, but
>>>> consider:
>>>>
>>>> DW_TAG_subprogram [2]
>>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
>>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000010)
>>>> ...
>>>> DW_TAG_inlined_subroutine
>>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
>>>> ...
>>>> DW_TAG_subprogram [2]
>>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000010)
>>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000016)
>>>> ...
>>>>
>>>> Here the low_pc of the inlined subroutine has the same relocation as
>>>> the low_pc of the first subprogram and the high_pc of the first subprogram
>>>> has the same relocation as the low_pc of the second one. Generically
>>>> applying the interesting relocations will update all these places and then
>>>> we would update them again with the offsets if we do not special case them.
>>>>
>>>
>>> "update them again with the offsets" - I'm not sure I understand that
>>> bit. Could you describe it in more detail?
>>>
>>>
>>> Let’s develop the 2 examples we have. Let’s say that the debug map
>>> contains only 2 entries (and to reply to your other question bellow, the
>>> debug map only contains 1 entry per linked symbol):
>>> Function X: object address 0x0 -> linked address 0x1000
>>> Function Y: object address 0x10 -> linked address 0x2010
>>>
>>> Function X offset is 0x1000 and Function Y offset is 0x2000.
>>>
>>> If we just apply the relocations corresponding to these 2 entries to
>>> your example, the low_pcs would get the correct values, but we need
>>> post-processing (applying the offsets to everything except the function
>>> low_pc) to get the right high_pcs:
>>>
>>
>> Great - thanks for fleshing out the example & explaining the phases!
>>
>> So a few questions: How does the post-processing work? Does it just walk
>> the whole DIE tree (for all the subprograms, etc) and whenever it sees a
>> low_pc it treats that as the offset for all other address values underneath
>> it? Do you keep track of the offset in some other way?
>>
>>
>> Yes, that’s what happens. When the DIE cloning algorithm enters a
>> function that has a debug map entry, it computes the PC offset and
>> propagates it recursively to the subtree.
>>
>
> Right, so if the low_pcs weren't pre-adjusted, this would come out of the
> algorithm cleanly (and the algorithm were adjusted to apply /at/ the
> subprogram level, rather than only beneath it - then you wouldn't even need
> a fix for the high_pc either, I guess (currently the post processing being
> applied beneath the subprogram, rather than at the subprogram as well means
> the high_pc isn't updated?))
>
>
> Yes, low/high_pc could be handled totally without applying the
> relocations. But, if you have a static DW_TAG_variable inside the function,
> you want to apply the relocations there in order to update its address.
>
> I made the choice to apply the relocations everywhere and to patch up the
> few corner cases that happen only for low/high_pc, rather than to special
> case where to apply the relocs.
>
OK, I'm starting to follow - thanks again for all the help/explanations
(including the comment below about offsets are only applied to high/low pc).
Could you point me to the code that does the "interesting relocation"
application? (it seems to me that it must know the attribute its modifying,
but I may be missing something(s))
>
>
>> What I'm suggesting, is that rather than doing this in two passes
>> (touching all the "interesting relocs" once, then touching all the
>> addresses, then needing to fix up those that happened to be treated as
>> both) it could be done in one pass - don't actually apply the interesting
>> relocs, but instead record the desired offset for the subtree and use that
>> to perform the post-processing.
>>
>>
>> Yes, it could be done this way, but it’s done the current way for 2
>> reasons:
>> - We are speaking about low_pc/high_pc, but the relocation processing
>> also handles DW_AT_locations transparently without requiring to dissect the
>> location expression, as well as other cases like the function pointer
>> template parameter value example that I gave in an earlier mail.
>>
>
> How does the function pointer template parameter value not end up with the
> duplicate offset handling too, though? (it'd be an address in the
> subprogram's DIE subtree… )
>
>
> We only apply the offset to low/high_pc attributes. The function address
> in a template parameter value will be in a DW_AT_const_value or something
> like that (I don’t need to care what attribute it’s in because the
> relocation will point at the right offset).
>
> Fred
>
>
> - It’s actually really cheap. Relocations are sorted by offset and the
>> DIE tree is traversed in ascending offset order, so to know if there is a
>> relocation to apply, a simple offset comparison is sufficient (no map
>> lookup or anything)
>>
>
>> Fred
>>
>> But for me to describe this more accurately (I realize it's rather
>> hand-wavey) I may need to better understand how the post-processing step
>> works.
>>
>> - Dave
>>
>>
>>>
>>> Non linked input:
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000006)
>>> …
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000010)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000016)
>>>
>>> After applying the interesting relocs:
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000001000)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000006)
>>> …
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000002010)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000016)
>>>
>>> After post-processing:
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000001000)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000001006)
>>> …
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000002010)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000002016)
>>>
>>>
>>> But things are not always that simple as can be seen in my modified
>>> example:
>>>
>>> Non linked input:
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000010)
>>> …
>>> DW_TAG_inlined_subroutine
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000000)
>>> …
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000000010)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000016)
>>> ...
>>>
>>> After applying interesting relocs:
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000001000)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000002010)
>>> …
>>> DW_TAG_inlined_subroutine
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000001000)
>>> …
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000002010)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000000016)
>>> ...
>>>
>>> If we blindly apply offsets everywhere except for the subprogram low_pc,
>>> we get:
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000001000)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000003010)*
>>> …
>>> DW_TAG_inlined_subroutine
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000002000)*
>>> …
>>> DW_TAG_subprogram [2]
>>> DW_AT_low_pc [DW_FORM_addr] (0x0000000000002010)
>>> DW_AT_high_pc [DW_FORM_addr] (0x0000000000002016)
>>> ...
>>>
>>> The 2 lines marked with * are wrong, because the relocation processing
>>> modified it and then we applied the offset. The usual case is the one you
>>> depicted, where only the function entry points have an interesting
>>> relocation. In that case, none of the low_pc/high_pc attributes except for
>>> the function low_pc have an interesting relocation. So all these attributes
>>> need to have the function offset applied to them. Except that sometimes
>>> these same attributes also get a relocation. The solution is simply to
>>> store the value of these attributes before relocation processing and use
>>> that as a base to apply the offset.
>>>
>>>
>>>>
>>>>
>>>>> Things I'm confused by:
>>>>>
>>>>> * Why does the second pass not touch the subprogram (how does the
>>>>> subprogram's high_pc get updated? Is that a special case? Does it need to
>>>>> be?)
>>>>>
>>>>>
>>>>> The high_pcs are special cased because of Dwarf2 where they hold
>>>>> addresses (and these addresses could correspond to the wrong thing). This
>>>>> patch added special casing for low_pcs very similar to what was already
>>>>> there for high_pcs.
>>>>>
>>>>> * Why is the low_pc (or low_pcs) get updated eagerly, rather than
>>>>> deferring it to be handled with the second pass/addition code? (so then it
>>>>> wouldn't need a special case, with another special case on top to
>>>>> workaround it)
>>>>>
>>>>>
>>>>> To be extra clear, I don’t eagerly update low_pcs. I try to
>>>>> generically apply relocations which happens to be mostly about updating the
>>>>> low_pcs (I do that one every DIE that has a valid relocation, not only
>>>>> subprograms). But there are other cases like for example:
>>>>>
>>>>> void foo();
>>>>> template <void (*T)()> struct S {};
>>>>> S<foo> s;
>>>>>
>>>>> The debug info for the template value parameter will have a relocation
>>>>> for foo that I need to apply.
>>>>>
>>>>
>>>> OK, that's certainly interesting. And that could be exactly the same as
>>>> the high_pc (if it's the same as the low_pc, there's nothing bad there - it
>>>> should still be updated the same as every other address). So the question
>>>> is how to resolve the ambiguity within a subprogram tree.
>>>>
>>>>
>>>> There shouldn’t be any ambiguity here. We get a relocation for a
>>>> function that *must* be in the debug map. And the fact that this function’s
>>>> low_pc could be the same as the high_pc isn’t relevant, there simply is no
>>>> ambiguity.
>>>>
>>>
>>> What I mean is if you didn't look at the context (is this a high_pc or a
>>> low_pc and what subprogram is it in, etc) and just blindly applied the
>>> relocation, you wouldn't know whether the reference to 'foo' from within
>>> the definition of some function template was a reference to the end of the
>>> current function or the start of the next. You assume it's always the start
>>> of the next, then you correct all the subprogram high_pcs - correcting them
>>> from referring to the following function to instead refer to the end of the
>>> current one. I think...
>>>
>>>
>>>>
>>>> Currently you resolve this ambiguity by assuming all references except
>>>> the high_pc (any high_pc, or just the high_pc of the subprogram?) refer to
>>>> other entities (the next subprogram, etc), not the "one off the end" case?
>>>>
>>>>
>>>> All the high_pcs and also all the low_pcs that aren’t a subprogram
>>>> low_pc need special casing.
>>>>
>>>
>>> I'm still trying to understand why the low_pcs need special casing.
>>>
>>>
>>>>
>>>> Also global variable locations might have complex expression that
>>>>> contain a relocations. Instead of special casing all the attributes that
>>>>> might contain a relocation I just apply the relocations and then patch up
>>>>> the places that I know could be wrong.
>>>>>
>>>>
>>>> I'm trying to better understand which ones could be wrong and why, and
>>>> how you identify them.
>>>>
>>>> (and secondarily, specifically why the low_pc could be wrong)
>>>>
>>>>
>>>> I think I gave a example of that above. Let me state the issue a bit
>>>> differently: I want to apply y all the relocations that correspond to debug
>>>> map entries.
>>>>
>>>
>>> Are debug map entries just for the start of each function? (so you'd
>>> update the low_pcs that correspond to the start of the function, but not
>>> any other low/high pcs inside inlined subroutines, lexical blocks, etc)
>>>
>>>
>>> Exactly.
>>>
>>> Then you do something else to update the rest? What's that something
>>> else/how does it work?
>>>
>>>
>>> Just applying the offsets as described above.
>>>
>>> Hope this helps,
>>> Fred
>>>
>>> I do that without looking at exactly where the relocation falls. Some of
>>>> the spots where these relocations where applied weren’t actually meant to
>>>> be relocated this way, but they were because their (object-file) addresses
>>>> matched something in the debug map. This can happen in 2 cases that I know
>>>> of: a block low_pc matching the enclosing function’s low_pc and a
>>>> function/block high_pc matching another function’s low_pc.
>>>>
>>>>
>>>>
>>>>> Doing it this way is forward looking. For example when I once get back
>>>>> to submitting my default argument value debug info patch, we will have
>>>>> blobs in the debug info that might contain relocations. dsymutil shouldn’t
>>>>> need any updating to handle that because of the way it’s done.
>>>>> dsymutil would need updating if we add a new attribute containing a
>>>>> relocation that might be ambiguous,
>>>>>
>>>>
>>>> Ambiguous like high_pc, you mean? *nod* It seems reasonable to special
>>>> case that in some way, I haven't thought about it enough to know just how
>>>> special, but "high_pc within a gives subprogram DIE tree is always relative
>>>> to that subprogram, not a reference to some other subprogram" seems OK (&
>>>> sounds like what you're doing).
>>>>
>>>>
>>>> Yes.
>>>>
>>>> Fred
>>>>
>>>> but my thinking is that this is much less likely that adding
>>>>> relocations that aren’t.
>>>>>
>>>>> Fred
>>>>>
>>>>>
>>>>>>
>>>>>>> Maybe I'm not understanding/explaining very well, though.
>>>>>>>
>>>>>>>
>>>>>>>> We might be able to completely remove any specific handling and
>>>>>>>> just ‘promote’ all the relocations that fall inside a linked function as
>>>>>>>> interesting. At the point we do that triaging relocs, we are not exploring
>>>>>>>> the DIE tree though, just looking at the relocation list, so it would
>>>>>>>> require us to trust the size field of the debug map, and I’m not sure we
>>>>>>>> can do that 100% of the time (I know that this field is not accurate, it’s
>>>>>>>> usually too big because it factors in alignment, but that might not be an
>>>>>>>> issue if nothing gets allocated in the alignment padding).
>>>>>>>>
>>>>>>>
>>>>>>> Hmm - not sure I follow this. You're suggesting that if a
>>>>>>> non-debug-aware tool applied the relocations in the object file/debug info,
>>>>>>> it would mangle/damage the debug info?
>>>>>>>
>>>>>>>
>>>>>>> Basically yes. As I explain above a relocation based off the __text
>>>>>>> section with a constant offset could be replaced by different values
>>>>>>> depending on the context. I already said that, but I guess the message is
>>>>>>> hard to get through: dsymutil uses the object file relocations to know what
>>>>>>> to link, but it doesn’t do relocation processing in the usual sense,
>>>>>>> because this simply wouldn’t work (More precisely, it tries to do as much
>>>>>>> standard relocation processing as possible, but it needs some code to
>>>>>>> workaround the cases where that logic gives the wrong result).
>>>>>>>
>>>>>>
>>>>>> It's slowly sinking in, I appreciate your patience in (repeatedly)
>>>>>> explaining it to me.
>>>>>>
>>>>>>
>>>>>> I hope I didn’t come through as complaining about that. I was merely
>>>>>> acknowledging that it’s very different from other platforms and thus hard
>>>>>> to convey to people not working with that platform. I really appreciate
>>>>>> your interest.
>>>>>>
>>>>>> Fred
>>>>>>
>>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150914/f4cd1d21/attachment.html>
More information about the llvm-commits
mailing list