[llvm] r246406 - [dsymutil] Fix handling of inlined_subprogram low_pcs

Thu Sep 10 22:40:17 PDT 2015

On Thu, Sep 10, 2015 at 10:24 PM, Frédéric Riss <friss at apple.com> wrote:

>
> On Sep 10, 2015, at 10:01 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, Sep 10, 2015 at 9:46 PM, Frédéric Riss <friss at apple.com> wrote:
>
>> Sorry for the delay in replying, I got caught in other stuff.
>>
>> On Sep 9, 2015, at 2:39 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Wed, Sep 9, 2015 at 2:23 PM, Frédéric Riss <friss at apple.com> wrote:
>>
>>>
>>> On Sep 9, 2015, at 10:40 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, Sep 9, 2015 at 9:52 AM, Frédéric Riss <friss at apple.com> wrote:
>>>
>>>>
>>>> On Sep 9, 2015, at 8:36 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Wed, Sep 9, 2015 at 8:16 AM, Frédéric Riss <friss at apple.com> wrote:
>>>>
>>>>>
>>>>> On Sep 8, 2015, at 10:09 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 8, 2015 at 1:10 PM, Frédéric Riss <friss at apple.com> wrote:
>>>>>
>>>>>>
>>>>>> On Sep 8, 2015, at 12:24 PM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 31, 2015 at 11:10 AM, Frédéric Riss <friss at apple.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Aug 31, 2015, at 9:07 AM, David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 31, 2015 at 9:05 AM, David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 30, 2015 at 6:43 PM, Frederic Riss via llvm-commits <
>>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> Author: friss
>>>>>>>>> Date: Sun Aug 30 20:43:14 2015
>>>>>>>>> New Revision: 246406
>>>>>>>>>
>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=246406&view=rev
>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject-3Frev-3D246406-26view-3Drev&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=_sX2G1Du1KZyzi5BD4_ddw&m=FkrVlXa3-EdUHEUklJrpTIxLR2zDdr3ysgnj0hyNiNc&s=-yp_b9w-sonxhFICg6npPkz6_FLOw29qR_X8EIzjwWY&e=>
>>>>>>>>> Log:
>>>>>>>>> [dsymutil] Fix handling of inlined_subprogram low_pcs
>>>>>>>>>
>>>>>>>>> The value of an inlined subprogram low_pc attribute should not
>>>>>>>>> get relocated, but it can happen that it matches the enclosing
>>>>>>>>> function's start address and thus gets the generic treatment.
>>>>>>>>> Special case it to avoid applying the PC offset twice.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm a tad confused - do you store the low_pcs as offsets relative
>>>>>>>> to the function
>>>>>>>>
>>>>>>>
>>>>>>> (sorry, bouncy shuttle to work & accidentally sent before I finished
>>>>>>> that sentence...)
>>>>>>>
>>>>>>> do you store the low_pcs as offsets relative to the function's
>>>>>>> low_pc? That's interesting - and perhaps something we should
>>>>>>> standardize/generalize to reduce relocations in all our DWARF output (but I
>>>>>>> don't think there's any standard for it yet in DWARF), but I'm not sure why
>>>>>>> that would require special casing the case where the two low_pcs are equal
>>>>>>> - wouldn't that just mean the low_pc of the inlined subroutine would be at
>>>>>>> zero offset from the subprogram's low_pc? (& still not relocated)
>>>>>>>
>>>>>>>
>>>>>>> dsymutil takes the debug map as input that only contains the
>>>>>>> function (and variables) start addresses. That’s the only thing we can
>>>>>>> count on being exact. We then do a pass over all the debug_info relocations
>>>>>>> to find the ones that correspond to those addresses (and the DIEs where we
>>>>>>> find the ‘interesting’ relocations are the ones that define which part of
>>>>>>> the DIE tree we keep). Then — once we decided what to keep — we go over the
>>>>>>> kept DIEs and we clone them, applying the relocations in the process. But
>>>>>>> note that the relocations we’ve chosen are only for the entry points, thus
>>>>>>> we need to have the code around to handle the
>>>>>>> lexical_block/inlined_subroutine, and this code doesn’t use the relocations
>>>>>>> (it applies an offset that we computed when handling the subprogram DIE).
>>>>>>>
>>>>>>> What  happened here is that the generic code that applied the
>>>>>>> relocations would also patch the inlined_subroutine low_pc because the
>>>>>>> relocation was the same as the entry point. And then the code handling the
>>>>>>> low_pc attributes for the inlined_subroutine would apply the offset a
>>>>>>> second time.
>>>>>>>
>>>>>>
>>>>>> OK - what I'm wondering is whether it would work better/as well to
>>>>>> generalize this code, rather than two distinct passes/processes.
>>>>>>
>>>>>>
>>>>>> I don’t think there’s a way to generalize this code. But I agree that
>>>>>> storing the low_/high_pcs as offsets from their enclosing function low_pc
>>>>>> would save quite a few relocations.
>>>>>>
>>>>>
>>>>> Sorry, that wasn't what I was trying to describe,
>>>>>
>>>>>
>>>>> I must admit that I didn’t really get your ‘2 distinct
>>>>> passes/processes’ so I replied to you original point. But now I think I see
>>>>> what you meant and I hope the rest of my answer did address that.
>>>>>
>>>>> but it's certainly something we've discussed before (actually I made a
>>>>> silly prototype of using dwarf expressions and debug address pool indicies
>>>>> to do reloc sharing (using one reloc per section (macho would use one reloc
>>>>> per function, due to the implied function-sections like behavior) - never
>>>>> did get around to running good numbers on it, though)).
>>>>>
>>>>>
>>>>>> Note that there is precedent for something like this: the ranges are
>>>>>> encoded as offsets from the *CU* low_pc. Maybe it would be more natural to
>>>>>> use that then?
>>>>>>
>>>>>
>>>>> Note to myself: I said ‘more natural’ above, but I didn’t really mean
>>>>> it (more in the line of the standard would have been a better expression of
>>>>> my thought). I never understood why the standard used the CU low_pc as a
>>>>> base. It’s hard to use for the compiler (cf the kludge we use by setting
>>>>> the CU low_pc to 0 when we have multiple address ranges).
>>>>>
>>>>
>>>> Do we still put the low_pc to 0 when we have DW_AT_ranges on the CU? I
>>>> guess maybe we do - been a while since I looked. (debuggers should just
>>>> have "no base" essentially, when the CU has ranges)
>>>>
>>>>
>>>>> Maybe I’m missing something, but the start of the function would have
>>>>> been much easier.
>>>>>
>>>>
>>>> Yeah, I was thinking generalizing it a bit "you can use a constant
>>>> address value which will be interpreted relative to the nearest enclosing
>>>> low_pc" - so even if you have a split CU, but a contiguous subprogram, you
>>>> can still share the low_pc of your subprogram. Or if you have a split
>>>> subprogram but a contiguous CU (as in the hot/cold splitting case) you
>>>> could still use that, etc. (this could happen further into subprograms too
>>>> - split CU, split subprogram, but possibly a contiguous lexical block
>>>> there, etc) - this wouldn't entirely minimize relocations, though - if you
>>>> had a split subprogram and a similarly split lexical block - the lexical
>>>> block ranges wouldn't share the base relocs of the subprogram's ranges
>>>> relocs, for example. (or if you had a split subprogram, split CU, but
>>>> contiguous lexical block - you still wouldn't get to share whichever
>>>> subprogram/cu reloc refers to the chunk that the lexical block is in)
>>>>
>>>>
>>>> Just out of curiosity: we do not do any form of function splitting (eg
>>>> hot/cold partitioning) AFAIK, so all these ‘problems’ are mostly
>>>> theoretical for now, right?
>>>>
>>>
>>> Right - just thinking about how general of a feature we might want to
>>> propose for DWARF standardization to address this issue.
>>>
>>> Currently all LLVM functions are contiguous, but CUs can be
>>> discontiguous and lexical scopes can be discontiguous.
>>>
>>> So just "use the nearest parental low_pc" would still be suboptimal for
>>> LLVM in some cases (not on MachO where function-sections-like behavior is
>>> the only behavior so there's no cases where you share relocs between
>>> functions) - several functions could share a reloc even if they're
>>> discontiguous but in a single section (ie: a non-debug function may be
>>> between two debug functions, but all in the same section together) or there
>>> might be several contiguous functions but within a CU with a discontiguous
>>> range (because there's a non-debug function elsewhere, or functions in
>>> other sections (comdat, etc)).
>>>
>>>
>>>>
>>>> That's why the prototype I did was fission-based, because there's
>>>> already address pooling implemented there (& we use fission anyway, so it
>>>> was in the space I was thinking of). It'd still need some extensions for
>>>> ranges, if I recall correctly, to allow ranges to use addr+offset as well.
>>>> (& I don't really think using generalized dwarf expressions is the right
>>>> solution for the addr+offset in DWARF attributes, but it was a fun way to
>>>> prototype it)
>>>>
>>>>
>>>>>
>>>>> If we had a (probably/preferably compact) encoding to describe this,
>>>>> it would probably be ideal.
>>>>>
>>>>> DWARF4 already has this /sort/ of thing for high_pc (where it can be
>>>>> encoded as a static offset relative to the low_pc - so it's not another
>>>>> relocation). That could possibly be generalized further to allow high_pcs
>>>>> to be a static offset relative to their enclosing high_pc (if one exists,
>>>>> otherwise it would be an unacceptable encoding (this could occur for
>>>>> functions - if the CU isn't a contiguous PC range (non-CU functions in
>>>>> between CU functions, functions in other sections, etc) or if a function
>>>>> itself is discontiguous (hot/cold code splitting)).
>>>>>
>>>>> Eric & I have bandied that around now & then, which lead to the
>>>>> aforementioned prototype I played around with, but didn't go any further
>>>>> than that - my improvements to Clang's debug info emission had already
>>>>> brought it down to half the size of GCC's, so we didn't have any particular
>>>>> need to push further at the time.
>>>>>
>>>>>
>>>>> Interesting to know.
>>>>>
>>>>>
>>>>>> low_pc should just be a zero-offset relocation, right?
>>>>>>
>>>>>>
>>>>>> Not on mach-o. Most relocations will be of the form __text+offset.
>>>>>> That’s why there is no way for me to differentiate a __text+offset
>>>>>> references the end of a range from the exact same relocation that
>>>>>> references the beginning of another one (and as the linker can tear apart
>>>>>> sections, that distinction is fundamental).
>>>>>>
>>>>>
>>>>> OK, so you search through looking for a subprogram that has a
>>>>> subprogram low_pc at __text+offset? then assume all the other low/high pcs
>>>>> (and ranges) are relative to that function starting point? (this is how you
>>>>> remove the ambiguity of the start/end?)
>>>>>
>>>>>
>>>>> Basically yes. It’s a bit more complicated because it’s a multi-phase
>>>>> process, but the end result is that while linking the DIEs we know if we
>>>>> are in a function and we know it’s object file and its linked address. We
>>>>> just apply that same offset to all the other object file addresses within
>>>>> that function.
>>>>>
>>>>
>>>> OK, I'll see if I can understand this/explain myself:
>>>>
>>>> It sounds like you search through for the subprogram DIE with the
>>>> appropriate low_pc matching the debug map entry you received, then you
>>>> update that low_pc, record the base offset of the subprogram and add that
>>>> to all the address attributes in the subprogram?
>>>>
>>>> But you don't search for the low_pc of the subprogram, you just search
>>>> for any low_pcs - update them all, then do the addition as a second pass.
>>>>
>>>>
>>>> Not exactly (sorry I should have given a more detailed answer upfront).
>>>> The phases I was referring to are:
>>>>  1/ Scan the relocations (without touching the debug info contents) to
>>>> look for interesting relics (i.e. matching something in the debug map)
>>>>  2/ Scan the debug info a first time to match these interesting relics
>>>> to subprograms low_pcs or variables locations (and from these ‘seeds’
>>>>  construct the full tree of DIEs to link)
>>>>  3/ Clone/Link the selected DIEs
>>>>
>>>> 3/ is where everything happens. 1/ and 2/ are just here to gather
>>>> information. During 3/ I apply the relocations *and* I use the special
>>>> casing code to workaround the bad attribute values it would generate.
>>>>
>>>
>>> And I'm trying to understand how the clone/link leads to bad values that
>>> need to be revised/treated specially.
>>>
>>> The way I would imagine/picture it is that 2/ finds the interesting
>>> low_pcs, says "this DIE subtree is interesting, we should link it" then
>>> while linking it, anything in the subtree would be updated using the same
>>> generic address updating algorithm relative to the "interesting address"
>>> from the debug map. I'm not seeing where this updating algorithm would need
>>> special cases for high or low pc at that point. Where does the special
>>> case/problematic results come in?
>>>
>>> I'll try to use more words...
>>>
>>> so looking at a simple example of two trivial (empty) functions
>>> targeting osx (I picked a random triple from the Clang codegen
>>> tests: -target x86_64-apple-macosx10.7.0) I get two CUs:
>>>
>>>   DW_TAG_subprogram [2]
>>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000000)
>>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000006)
>>>     ...
>>>   DW_TAG_subprogram [2]
>>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000010)
>>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000016)
>>>     ...
>>>
>>> And, if I understand correctly there are no relocations or anything in
>>> there - it's just literal fixed values.
>>>
>>>
>>> No, those will all have relocations (but the way the relocations work on
>>> mach-o, the content of the file where you should apply the relocation
>>> actually corresponds to the address in the object file).
>>>
>>
>> Ah, right, I was mixing up the relocations for addresses with the
>> relocations for inter-section references within the DWARF, the latter are
>> emitted as constants rather than relocs I think... nevermind, anyway.
>>
>>
>>>
>>> So the debug map must say things like "function at 0x0 actually ended up
>>> at X, function at 0x10 actually ended up at Y”?
>>>
>>>
>>> Yes, that’s it.
>>>
>>> I then picture the algorithm in 2/ identifying the first subprogram as
>>> being at X, and the second subprogram being at Y.
>>>
>>>
>>> In 2/, we just say “that subprogram has a relocation in its low_pc
>>> attribute that corresponds to something described in the debug map, let’s
>>> mark all its dependencies as required”.
>>>
>>
>> What do you mean by "all its dependencies" in this case?
>>
>>
>> The DIE dependencies is the transitive closure of the the DIEs it refers
>> to and its parent chain.
>>
>>
>>> As the DIE tree for the first subprogram is cloned, I would imagine X -
>>> 0x0 is added to every address value. End result: low = X, high = X + 0x6
>>>
>>> As the DIE tree for the second subprogram is cloned, I would imagine Y -
>>> 0x10 is added to every address value. End result: low = Y, high = Y + 0x6
>>>
>>> I don't see where the high or low pc special cases end up coming into
>>> the equation. They seem to fall out, at least from the way I'm picturing it.
>>>
>>>
>>> Before doing the special casing, I apply the relocations without even
>>> looking at what attribute they correspond. So you end up with the correct
>>> low_pcs, and then I go through the DIE and patch up the high_pc with the
>>> right offset. In this case, everything would work on your example, but
>>> consider:
>>>
>>>   DW_TAG_subprogram [2]
>>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000000)
>>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000010)
>>>     ...
>>>     DW_TAG_inlined_subroutine
>>>       DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000000)
>>>     ...
>>>  DW_TAG_subprogram [2]
>>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000010)
>>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000016)
>>>     ...
>>>
>>> Here the low_pc of the inlined subroutine has the same relocation as the
>>> low_pc of the first subprogram and the high_pc of the first subprogram has
>>> the same relocation as the low_pc of the second one. Generically applying
>>> the interesting relocations will update all these places and then we would
>>> update them again with the offsets if we do not special case them.
>>>
>>
>> "update them again with the offsets" - I'm not sure I understand that
>> bit. Could you describe it in more detail?
>>
>>
>> Let’s develop the 2 examples we have. Let’s say that the debug map
>> contains only 2 entries (and to reply to your other question bellow, the
>> debug map only contains 1 entry per linked symbol):
>> Function X: object address 0x0 -> linked address 0x1000
>> Function Y: object address 0x10 -> linked address 0x2010
>>
>> Function X offset is 0x1000 and Function Y offset is 0x2000.
>>
>> If we just apply the relocations corresponding to these 2 entries to your
>> example, the low_pcs would get the correct values, but we need
>> post-processing (applying the offsets to everything except the function
>> low_pc) to get the right high_pcs:
>>
>
> Great - thanks for fleshing out the example & explaining the phases!
>
> So a few questions: How does the post-processing work? Does it just walk
> the whole DIE tree (for all the subprograms, etc) and whenever it sees a
> low_pc it treats that as the offset for all other address values underneath
> it? Do you keep track of the offset in some other way?
>
>
> Yes, that’s what happens. When the DIE cloning algorithm enters a function
> that has a debug map entry, it computes the PC offset and propagates it
> recursively to the subtree.
>

Right, so if the low_pcs weren't pre-adjusted, this would come out of the
algorithm cleanly (and the algorithm were adjusted to apply /at/ the
subprogram level, rather than only beneath it - then you wouldn't even need
a fix for the high_pc either, I guess (currently the post processing being
applied beneath the subprogram, rather than at the subprogram as well means
the high_pc isn't updated?))

>
> What I'm suggesting, is that rather than doing this in two passes
> (touching all the "interesting relocs" once, then touching all the
> addresses, then needing to fix up those that happened to be treated as
> both) it could be done in one pass - don't actually apply the interesting
> relocs, but instead record the desired offset for the subtree and use that
> to perform the post-processing.
>
>
> Yes, it could be done this way, but it’s done the current way for 2
> reasons:
>  - We are speaking about low_pc/high_pc, but the relocation processing
> also handles DW_AT_locations transparently without requiring to dissect the
> location expression, as well as other cases like the function pointer
> template parameter value example that I gave in an earlier mail.
>

How does the function pointer template parameter value not end up with the
duplicate offset handling too, though? (it'd be an address in the
subprogram's DIE subtree... )

>  - It’s actually really cheap. Relocations are sorted by offset and the
> DIE tree is traversed in ascending offset order, so to know if there is a
> relocation to apply, a simple offset comparison is sufficient (no map
> lookup or anything)
>

> Fred
>
> But for me to describe this more accurately (I realize it's rather
> hand-wavey) I may need to better understand how the post-processing step
> works.
>
> - Dave
>
>
>>
>> Non linked input:
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000000)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000006)
>>     …
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000010)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000016)
>>
>> After applying the interesting relocs:
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000001000)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000006)
>>     …
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000002010)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000016)
>>
>> After post-processing:
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000001000)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000001006)
>>     …
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000002010)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000002016)
>>
>>
>> But things are not always that simple as can be seen in my modified
>> example:
>>
>> Non linked input:
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000000)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000010)
>>     …
>>     DW_TAG_inlined_subroutine
>>       DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000000)
>>     …
>>  DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000000010)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000016)
>>     ...
>>
>> After applying interesting relocs:
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000001000)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000002010)
>>     …
>>     DW_TAG_inlined_subroutine
>>       DW_AT_low_pc [DW_FORM_addr]     (0x0000000000001000)
>>     …
>>  DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000002010)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000000016)
>>     ...
>>
>> If we blindly apply offsets everywhere except for the subprogram low_pc,
>> we get:
>>   DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000001000)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000003010)*
>>     …
>>     DW_TAG_inlined_subroutine
>>       DW_AT_low_pc [DW_FORM_addr]     (0x0000000000002000)*
>>     …
>>  DW_TAG_subprogram [2]
>>     DW_AT_low_pc [DW_FORM_addr]     (0x0000000000002010)
>>     DW_AT_high_pc [DW_FORM_addr]    (0x0000000000002016)
>>     ...
>>
>> The 2 lines marked with * are wrong, because the relocation processing
>> modified it and then we applied the offset. The usual case is the one you
>> depicted, where only the function entry points have an interesting
>> relocation. In that case, none of the low_pc/high_pc attributes except for
>> the function low_pc have an interesting relocation. So all these attributes
>> need to have the function offset applied to them. Except that sometimes
>> these same attributes also get a relocation. The solution is simply to
>> store the value of these attributes before relocation processing and use
>> that as a base to apply the offset.
>>
>>
>>>
>>>
>>>> Things I'm confused by:
>>>>
>>>> * Why does the second pass not touch the subprogram (how does the
>>>> subprogram's high_pc get updated? Is that a special case? Does it need to
>>>> be?)
>>>>
>>>>
>>>> The high_pcs are special cased because of Dwarf2 where they hold
>>>> addresses (and these addresses could correspond to the wrong thing). This
>>>> patch added special casing for low_pcs very similar to what was already
>>>> there for high_pcs.
>>>>
>>>> * Why is the low_pc (or low_pcs) get updated eagerly, rather than
>>>> deferring it to be handled with the second pass/addition code? (so then it
>>>> wouldn't need a special case, with another special case on top to
>>>> workaround it)
>>>>
>>>>
>>>> To be extra clear, I don’t eagerly update low_pcs. I try to generically
>>>> apply relocations which happens to be mostly about updating the low_pcs (I
>>>> do that one every DIE that has a valid relocation, not only subprograms).
>>>> But there are other cases like for example:
>>>>
>>>> void foo();
>>>> template <void (*T)()> struct S {};
>>>> S<foo> s;
>>>>
>>>> The debug info for the template value parameter will have a relocation
>>>> for foo that I need to apply.
>>>>
>>>
>>> OK, that's certainly interesting. And that could be exactly the same as
>>> the high_pc (if it's the same as the low_pc, there's nothing bad there - it
>>> should still be updated the same as every other address). So the question
>>> is how to resolve the ambiguity within a subprogram tree.
>>>
>>>
>>> There shouldn’t be any ambiguity here. We get a relocation for a
>>> function that *must* be in the debug map. And the fact that this function’s
>>> low_pc could be the same as the high_pc isn’t relevant, there simply is no
>>> ambiguity.
>>>
>>
>> What I mean is if you didn't look at the context (is this a high_pc or a
>> low_pc and what subprogram is it in, etc) and just blindly applied the
>> relocation, you wouldn't know whether the reference to 'foo' from within
>> the definition of some function template was a reference to the end of the
>> current function or the start of the next. You assume it's always the start
>> of the next, then you correct all the subprogram high_pcs - correcting them
>> from referring to the following function to instead refer to the end of the
>> current one. I think...
>>
>>
>>>
>>> Currently you resolve this ambiguity by assuming all references except
>>> the high_pc (any high_pc, or just the high_pc of the subprogram?) refer to
>>> other entities (the next subprogram, etc), not the "one off the end" case?
>>>
>>>
>>> All the high_pcs and also all the low_pcs that aren’t a subprogram
>>> low_pc need special casing.
>>>
>>
>> I'm still trying to understand why the low_pcs need special casing.
>>
>>
>>>
>>> Also global variable locations might have complex expression that
>>>> contain a relocations. Instead of special casing all the attributes that
>>>> might contain a relocation I just apply the relocations and then patch up
>>>> the places that I know could be wrong.
>>>>
>>>
>>> I'm trying to better understand which ones could be wrong and why, and
>>> how you identify them.
>>>
>>> (and secondarily, specifically why the low_pc could be wrong)
>>>
>>>
>>> I think I gave a example of that above. Let me state the issue a bit
>>> differently: I want to apply y all the relocations that correspond to debug
>>> map entries.
>>>
>>
>> Are debug map entries just for the start of each function? (so you'd
>> update the low_pcs that correspond to the start of the function, but not
>> any other low/high pcs inside inlined subroutines, lexical blocks, etc)
>>
>>
>> Exactly.
>>
>> Then you do something else to update the rest? What's that something
>> else/how does it work?
>>
>>
>> Just applying the offsets as described above.
>>
>> Hope this helps,
>> Fred
>>
>> I do that without looking at exactly where the relocation falls. Some of
>>> the spots where these relocations where applied weren’t actually meant to
>>> be relocated this way, but they were because their (object-file) addresses
>>> matched something in the debug map. This can happen in 2 cases that I know
>>> of: a block low_pc matching the enclosing function’s low_pc and a
>>> function/block high_pc matching another function’s low_pc.
>>>
>>>
>>>
>>>> Doing it this way is forward looking. For example when I once get back
>>>> to submitting my default argument value debug info patch, we will have
>>>> blobs in the debug info that might contain relocations. dsymutil shouldn’t
>>>> need any updating to handle that because of the way it’s done.
>>>> dsymutil would need updating if we add a new attribute containing a
>>>> relocation that might be ambiguous,
>>>>
>>>
>>> Ambiguous like high_pc, you mean? *nod* It seems reasonable to special
>>> case that in some way, I haven't thought about it enough to know just how
>>> special, but "high_pc within a gives subprogram DIE tree is always relative
>>> to that subprogram, not a reference to some other subprogram" seems OK (&
>>> sounds like what you're doing).
>>>
>>>
>>> Yes.
>>>
>>> Fred
>>>
>>> but my thinking is that this is much less likely that adding relocations
>>>> that aren’t.
>>>>
>>>> Fred
>>>>
>>>>
>>>>>
>>>>>> Maybe I'm not understanding/explaining very well, though.
>>>>>>
>>>>>>
>>>>>>> We might be able to completely remove any specific handling and just
>>>>>>> ‘promote’ all the relocations that fall inside a linked function as
>>>>>>> interesting. At the point we do that triaging relocs, we are not exploring
>>>>>>> the DIE tree though, just looking at the relocation list, so it would
>>>>>>> require us to trust the size field of the debug map, and I’m not sure we
>>>>>>> can do that 100% of the time (I know that this field is not accurate, it’s
>>>>>>> usually too big because it factors in alignment, but that might not be an
>>>>>>> issue if nothing gets allocated in the alignment padding).
>>>>>>>
>>>>>>
>>>>>> Hmm - not sure I follow this. You're suggesting that if a
>>>>>> non-debug-aware tool applied the relocations in the object file/debug info,
>>>>>> it would mangle/damage the debug info?
>>>>>>
>>>>>>
>>>>>> Basically yes. As I explain above a relocation based off the __text
>>>>>> section with a constant offset could be replaced by different values
>>>>>> depending on the context. I already said that, but I guess the message is
>>>>>> hard to get through: dsymutil uses the object file relocations to know what
>>>>>> to link, but it doesn’t do relocation processing in the usual sense,
>>>>>> because this simply wouldn’t work (More precisely, it tries to do as much
>>>>>> standard relocation processing as possible, but it needs some code to
>>>>>> workaround the cases where that logic gives the wrong result).
>>>>>>
>>>>>
>>>>> It's slowly sinking in, I appreciate your patience in (repeatedly)
>>>>> explaining it to me.
>>>>>
>>>>>
>>>>> I hope I didn’t come through as complaining about that. I was merely
>>>>> acknowledging that it’s very different from other platforms and thus hard
>>>>> to convey to people not working with that platform. I really appreciate
>>>>> your interest.
>>>>>
>>>>> Fred
>>>>>
>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150910/ba5e2040/attachment.html>