[PATCH] Avoid empty .debug_loc entries and lists

Fri Jun 5 11:16:09 PDT 2015

On Fri, Jun 5, 2015 at 10:59 AM, Adrian Prantl <aprantl at apple.com> wrote:

>
> On Jun 5, 2015, at 10:17 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Jun 5, 2015 at 10:05 AM, Adrian Prantl <aprantl at apple.com> wrote:
>
>>
>> On Jun 5, 2015, at 9:45 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>>
>> On Fri, Jun 5, 2015 at 9:34 AM, Adrian Prantl <aprantl at apple.com> wrote:
>>
>>>
>>> On Jun 4, 2015, at 5:33 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>>
>>> On Thu, Jun 4, 2015 at 5:23 PM, Duncan P. N. Exon Smith <
>>> dexonsmith at apple.com> wrote:
>>>
>>>>
>>>> > On 2015 Jun 4, at 16:50, David Blaikie <dblaikie at gmail.com> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Jun 4, 2015 at 4:46 PM, Duncan P. N. Exon Smith <
>>>> dexonsmith at apple.com> wrote:
>>>> >
>>>> > > On 2015 Jun 4, at 16:36, David Blaikie <dblaikie at gmail.com> wrote:
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Thu, Jun 4, 2015 at 4:28 PM, Duncan P. N. Exon Smith <
>>>> dexonsmith at apple.com> wrote:
>>>> > >
>>>> > > > On 2015 Jun 4, at 16:09, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>> > > >
>>>> > > > Judging by the code, does this cause us not to create a variable
>>>> if it has no locations? That's probably not quite right - even if we have
>>>> no locations, it's probably a good idea/correct to describe the variable so
>>>> shadowing, etc, works correctly (if someone tries to print the variable the
>>>> compiler still finds the right variable but just reports that it doesn't
>>>> know the location)
>>>> > >
>>>> > > So the code here already has an early return if there are no ranges:
>>>> > >
>>>> > >     for (const auto &I : DbgValues) {
>>>> > >       InlinedVariable IV = I.first;
>>>> > >       if (Processed.count(IV))
>>>> > >         continue;
>>>> > >
>>>> > >       // Instruction ranges, specifying where IV is accessible.
>>>> > >       const auto &Ranges = I.second;
>>>> > >       if (Ranges.empty())
>>>> > >         continue;
>>>> > >
>>>> > > My patch effectively hits this `continue` more often.  The
>>>> difference
>>>> > > is cases where we *thought* we knew how to emit ranges, but in fact
>>>> > > didn't know how.
>>>> > >
>>>> > > I suppose I assumed the intended behaviour was "don't emit the
>>>> > > variable if we don't have any ranges for it" since that's what was
>>>> > > going on, but are you saying this `continue` is a bug?  (Or am I
>>>> > > totally missing your point?)
>>>> > >
>>>> > > How do we end up with an empty range list here? That seems strange
>>>> - I assume we're just lazily creating the rang lists in
>>>> calculateDbgValueHistory... (I could sort of understand empty entries in a
>>>> range list - when it turns out the dbg_value intrinsic describes no
>>>> instructions due to things being optimized away, hoisted here or there,
>>>> etc).
>>>> >
>>>> > No idea, I just assume it happens because there's code.
>>>> >
>>>> > Not sure what you mean.
>>>> >
>>>> > What I'm trying to understand is whether or not we should be fixing
>>>> this bug closer to the source (in calculateDbgValueHistory) that's
>>>> producing empty ranges or empty entries.
>>>>
>>>> One thing that Adrian walked me through is that even if we have
>>>> ranges, and we have entries after the call to `buildLocationList()`,
>>>> we can still fail to emit any bytes during `DebugLocEntry::finalize()`
>>>> since we don't know how to emit every DWARF expression.
>>>>
>>>
>>> How are we building DWARF expressions we can't emit? Shouldn't we just
>>> not build those? (I guess this is more of a question for Adrian)
>>>
>>>
>>> There are two cases that I’m aware of: Constant float expressions and
>>> MachineRegisters that cannot be synthesized in DWARF. Constant floats
>>> cannot be expressed in DWARF. DwarfExpression is trying very hard to
>>> synthesize registers that that don’t have a DWARF register number out of
>>> combinations/pieces of sub- and super-registers but may fail. Basically we
>>> have to try to build the expression before we know whether it can be built
>>> or not.
>>>
>>
>> Sure - but at the end, if we decide we couldn't build the expression,
>> wouldn't we just throw out the whole entry? (& if we end up with no
>> entries, we'd not add the range list, etc)
>>
>>
>>>
>>>
>>>
>>>> Because of that, I changed my approach from "trying to get `Ranges` to
>>>> match up with `Entries`" to "popping empty/useless entries and ranges
>>>> off the stack", and figured Ranges/Entries inconsistencies were a
>>>> compile-time optimization that weren't as urgent.
>>>>
>>>
>>> My perspective is just to understand why they're there in the first
>>> place - I dislike solutions/fixes/patches that are predicated on "I don't
>>> know why this state occurs, but I can handle it here" - they feel
>>> unsatisfying & can end up layering workarounds on workarounds when there
>>> was some underlying issues that could've been addressed & lead to simpler
>>> code with stronger invariants.
>>>
>>> But if there's a reason for them (Empty range lists or empty ranges in
>>> the range lists) that's cool - I just don't know it & wanted to understand
>>> it before I'd be comfortable with the patch.
>>>
>>>
>>> I’m pretty confident that empty ranges only happen because we use a
>>> (too) simple heuristic to decide wether to emit a location in-line or as a
>>> .debug_loc entry. (That heuristic being whether there is only a single
>>> DBG_VALUE per bit_piece of the variable).
>>>
>>
>> I'm a bit confused - I figured we'd build up the range list - then, if
>> the length of that list was one we'd use an in-line location otherwise we'd
>> use a debug_loc entry. (& if the length of the list was zero, we wouldn't
>> add it in the first place)
>>
>>
>> That would off course be much saner. In fact Duncan’s patch gets us there.
>>
>
> Perhaps I've lost all basis for this discussion, sorry - it looked like
> Duncan's patch it filters the results after DbgValues and the ranges it
> contains have been built.
>
> What I'm trying to understand is what the motivation/justification is for
> not enforcing a stronger invariant on the code that builds this data
> structure (calculateDbgValueHistory) so it never produces empty ranges in
> this list in the first place.
>
>
> Oh, got it now :-)
>
> In my opinion, it shouldn’t necessarily be calculateDbgValueHistory’s job
> to have to care about the DWARF encoding.
>

Fair enough - still vaguely confused why we don't know until this late.
(not having numbers for certain registers? Isn't that something the
platform can choose & we can fix at our discretion (pick numberings)?
floating point constants, etc we could detect at the time we create the
metadata and/or, again, pick an encoding?) dunno)

> I mean, if we want to, we could have some sort of bool
> DwarfExpression::canEncode() functionality, but it would then also have to
> know about pieces, which could complicate the code and make it much harder
> to reason about. Having a separation between calculateDbgValueHistory which
> flattens the graph into a list of ranges and then a second
> buildLocationList pass operating on that list to coalesce adjacent ranges
> and break up longer ranges where ranges for other pieces of the same
> variable start or end is maybe a better separation of work.
>
> That said, I don’t find the current buildLocationList particularily
> elegant, and Fred is planning to rewrite DbgValeHistoryCalculator in the
> near future, so we will definitely revisit this when this comes up.
>

Rightio - I'll keep an eye out for it.

Looks OK then - thanks for all your patience.

- Dave

>
> -- adrian
>
> (sorry, I realize the ambiguity now - I said "avoid empty range entries"
> and you reasonably read that as referring to the output -  when what I was
> trying to refer to was the internal data structure "DbgValues" that
> contains the range data we're working from)
>
>
>>
>> It doesn’t handle the single entry special case, but it also shouldn’t: I
>> just realized that it is not necessarily correct to emit a single-entry
>> debug_loc entry inline, because in a case like:
>>
>>   l1: DBG_VALUE, var, reg0
>>   ...
>>   l2: DBG_VALUE, var, noreg
>>   ...
>>
>> we’d still need to emit a single-entry debug_loc entry to describe that
>> the reg0 location is only valid in the range [l1..l2).
>>
>
> Agreed.
>
>
>>
>>
>>
>>> If building one or more expressions fails then we may end up with one
>>> (should have been inline instead) or zero (shouldn’t have a DW_AT_location
>>> at all) .debug_loc entries.
>>>
>>
>> Can we practically/easily remove those at the source?
>>
>>
>> Yes: DwarfDebug::collectVariableInfo() should invoke
>> RegVar->setDebugLocListIndex() _after_ buildLocationList() and only if the
>> location list is nonempty.
>>
>> -- adrian
>>
>>
>>
>>>
>>> -- adrian
>>>
>>>
>>>
>>>> Maybe we should also be adding the commented out code I had in:
>>>>
>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278797.html
>>>>
>>>> It also seems like the `continue` in the current code might be the wrong
>>>> thing (and we should create the variable with no DW_AT_location?).
>>>>
>>>
>>> As you mentioned, the loop that catches optimized-out variables is
>>> probably fine for all these cases (because there are cases where all the
>>> dbg_values could be optimized away & we'd find no evidence of the variable
>>> at all, I should imagine - so that logic has to be up to the task of coping
>>> with these cases).
>>>
>>>
>>>> (All of this stuff is a tangent for me.  I wandered in here to remove
>>>> the empty .debug_loc entries/lists in the output.)
>>>>
>>>
>>> Right - just trying to make sure I understand why this is the right
>>> place to fix that issue & not working around some other issue/missing
>>> invariant.
>>>
>>>
>>>> > What's the right thing to do?
>>>> >
>>>> > I don't have all the context here to know that off-hand. Trying to
>>>> discuss it to try to wrap my head around enough of it to offer a useful
>>>> opinion - sorry it takes a while.
>>>>
>>>> Not trying to rush you, sorry if it seems that way.  This isn't really
>>>> blocking me (I have other things to work on (although obviously I'd
>>>> like to resolve this some time)).
>>>>
>>>> > Should we create a variable the same
>>>> > was as the for loop below for function arguments (that doesn't
>>>> > attach ranges)?  Should we not create a variable?
>>>> >
>>>> > If we get the same output between creating it here or in the later
>>>> loop, it doesn't matter in the broad sense.
>>>> >
>>>> > Is it actually
>>>> > *useful* that we create a .debug_loc list with no entries and point
>>>> > to it?
>>>> >
>>>> > No, there's no benefit to pointing to an empty loc list compared to
>>>> just having no DW_AT_location at all.
>>>>
>>>> Thanks for confirming.
>>>>
>>>> I'd be happy to adjust my patch to just address the empty .debug_loc
>>>> list/entries and continue to create the same variables that were being
>>>> created before (just missing whatever magic makes them create a
>>>> `DW_AT_location` -- I'm guessing I just use the constructor logic
>>>> that's used for optimized-out function parameters in the loop that
>>>> follows?).
>>>>
>>>
>>> No, I think that's fine as is - as you pointed out, we have that other
>>> loop for emitting optimized-out variables and we have other logic that's
>>> skipping variables here already.
>>>
>>>
>>>> (It sounds like it was a poor assumption on my part that we were trying
>>>> to avoid emitting variables if they have no ranges.)
>>>
>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150605/ffd153b5/attachment.html>