[llvm-dev] Proposal for multi location debug info support in LLVM IR

Fri Jan 15 15:04:52 PST 2016

On Fri, Jan 15, 2016 at 3:03 PM, Keno Fischer <kfischer at college.harvard.edu>
wrote:

> We do, ish, but it's not enforced as far as I can tell. I do think there
> is a situation where clang can create such code (don't ask me how though, I
> encountered it while hunting a different bug and just noticed it looked
> odd). This was during an LTO build, so inlining related perhaps?
>

Possible... but I don't know

>
> On Fri, Jan 15, 2016 at 11:55 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
>
>>
>>
>> On Fri, Jan 15, 2016 at 2:44 PM, Keno Fischer <
>> kfischer at college.harvard.edu> wrote:
>>
>>> Adrian had proposed the following staging:
>>>
>>> 1. Remove offset argument from dbg.value
>>> 2. Unify dbg.value and dbg.declare
>>> 3. Full implementation
>>>
>>> I'm not yet sure what to do about the difference in dbg.declare
>>> semantics. For example, i think the following currently works
>>>
>>> ```
>>> top:
>>> %x = alloca
>>> br else
>>>
>>> if:
>>> dbg.declare(%x...
>>> unreachable
>>>
>>> else:
>>> # dbg.declare still applies here
>>>
>>
>> Hmm - I thought we had some (perhaps undocumented) rule that dbg.declares
>> should all go in the entry with the allocas? I assume Clang follows this
>> rule at least.
>>
>>
>>> ```
>>>
>>> I think it would be reasonable to switch to the proposed dominance
>>> semantics during step 2, but we'll have to see if that negatively affects
>>> any real-world test cases.
>>>
>>> On Fri, Jan 15, 2016 at 11:38 PM, David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> I'm reading/following along - discussion so far sounds reasonable to me.
>>>>
>>>> Only minor note: if dbg.value/declare can be narrowed down to one (I
>>>> think you mentioned in your original proposal that it seemed like
>>>> everything could be just dbg.value?) that'd be a good step, regardless -
>>>> possibly ahead of/while this conversation is underway. Or is it the case
>>>> that the proposed enhanced semantics are required before that transition
>>>> (because currently dbg.value only goes to the end of the BB? if I recall
>>>> correctly, whereas dbg.declare is the whole function)? In the latter case,
>>>> perhaps it'd be a good first step/goal/transition to do as
>>>> cleanup/generalization anyway.
>>>>
>>>> - Dave
>>>>
>>>> On Wed, Jan 6, 2016 at 2:02 PM, Vivek Sarkar via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> I will be out of the office on January 7th and will return on January
>>>>> 19th.  I will not have access to email during this time.  Please contact
>>>>> Karen Lavelle at klavelle at rice.edu or 713-348-2062 if you have any
>>>>> questions or concerns.
>>>>>
>>>>> Best regards,
>>>>> Annepha
>>>>>
>>>>> On Jan 6, 2016, at 3:58 PM, Adrian Prantl via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> >
>>>>> > On Jan 5, 2016, at 10:37 AM, Keno Fischer <
>>>>> kfischer at college.harvard.edu> wrote:
>>>>> > On Tue, Jan 5, 2016 at 6:59 PM, Adrian Prantl <aprantl at apple.com>
>>>>> wrote:
>>>>> > Thanks for the clarification, Paul!
>>>>> > Keno, just a few more questions for my understanding:
>>>>> >
>>>>> > >     - Indicating that a value changed at source level (e.g.
>>>>> because an
>>>>> > >       assignment occurred)
>>>>> >
>>>>> > This is done by a key call.
>>>>> >
>>>>> > Correct
>>>>> >
>>>>> > >     - Indicating that the same value is now available in a new
>>>>> location
>>>>> >
>>>>> > Additional, alternative locations with identical contents are added
>>>>> by passing in the token from a key call.
>>>>> >
>>>>> > Correct
>>>>> >
>>>>> > >     - Indicating that a value is no longer available in some
>>>>> location
>>>>> >
>>>>> > This is done by another key call (possibly with an %undef location).
>>>>> >
>>>>> > Not quite. Another key call could be used if all locations are now
>>>>> invalid. However, to just remove a single value, I was proposing
>>>>> >
>>>>> > ; This is the key call
>>>>> > %first = call token @llvm.dbg.value(token undef, %someloc,
>>>>> >                                   metadata !var, metadata !())
>>>>> >
>>>>> > ; This adds a location
>>>>> > %second = call token @llvm.dbg.value(token %second, %someotherloc,
>>>>> >                                   metadata !var, metadata !())
>>>>> >
>>>>> > ; This removes the (%second) location
>>>>> > %third = call token @llvm.dbg.value(token %second, metadata token
>>>>> undef,
>>>>> >                                   metadata !var, metadata !())
>>>>> >
>>>>> > Thus, to remove a location you always pass in the token of the call
>>>>> that added the location. This is also the reason why I'm requiring the
>>>>> second argument to be `token undef` because no valid location can be of
>>>>> type token, and I wanted to avoid the situation in which a location gets
>>>>> replaced by undef everywhere, accidentally turning into a removal of the
>>>>> location specified by the key call
>>>>> >
>>>>> > Makes sense. If I understand your comment correctly, the following
>>>>> snippet:
>>>>> >
>>>>> > %1 = ...
>>>>> > %token = call llvm.dbg.value(token %undef, %1, !var, !())
>>>>> > %2 = ...
>>>>> > call llvm.dbg.value(token %token, %undef, !var, !())
>>>>> > call llvm.dbg.value(token %undef, %2, !var, !())
>>>>> >
>>>>> > is equivalent to
>>>>> >
>>>>> > %1 = ...
>>>>> > call llvm.dbg.value(token %undef, %1, !var, !())
>>>>> > %2 = ...
>>>>> > call llvm.dbg.value(token %undef, %2, !var, !())
>>>>> >
>>>>> > and both are legal.
>>>>> >
>>>>> > > > >
>>>>> > > > >     - To add a location with the same value for the same
>>>>> variable, you
>>>>> > > > pass the
>>>>> > > > >       token of the FIRST llvm.dbg.value, as this
>>>>> llvm.dbg.value's first
>>>>> > > > argument
>>>>> > > > >       E.g. to add another location for the variable above:
>>>>> > > > >
>>>>> > > > >         %second =3D call token @llvm.dbg.value(token %first,
>>>>> metadata
>>>>> > > > %val2,
>>>>> > > > >                                             metadata !var,
>>>>> metadata
>>>>> > > > !expr2)
>>>>> > > >
>>>>> > > > Does this invalidate the first location, or does this add an
>>>>> additional
>>>>> > > > location
>>>>> > > > to the set of locations for var at this point? If I want to add
>>>>> a third
>>>>> > > > location,
>>>>> > > > which token do I pass in? Can you explain a bit more what
>>>>> information the
>>>>> > > > token
>>>>> > > > allows us to express that is currently not possible?
>>>>> > > >
>>>>> > >
>>>>> > > It adds a second location. If you want to add a third location you
>>>>> pass in
>>>>> > > the first token again.
>>>>> > > Thus the first call (key call) indicates a change of values, and
>>>>> all
>>>>> > > locations that have the same value should use the key call's token.
>>>>> > >
>>>>> >
>>>>> > Ok. Looks like this is going to be somewhat verbose for partial
>>>>> updates of SROA’ed aggregates as in the following example:
>>>>> >
>>>>> > // struct s { int i, j };
>>>>> > // void foo(struct s) { s.j = 0; ... }
>>>>> >
>>>>> > define void @foo(i32 %i, i32 %j) {
>>>>> >   %token = call llvm.dbg.value(token %undef, %i, !Struct,
>>>>> !DIExpression(DW_OP_bit_piece(0, 32)))
>>>>> >            call llvm.dbg.value(token %token, %j, !Struct,
>>>>> !DIExpression(DW_OP_bit_piece(32, 32)))
>>>>> >   ...
>>>>> >
>>>>> >   ; have to repeat %i here:
>>>>> >   %tok2 = call llvm.dbg.value(token %undef, %i, !Struct,
>>>>> !DIExpression(DW_OP_bit_piece(0, 32)))
>>>>> >           call llvm.dbg.value(token %tok2, metadata i32 0, !Struct,
>>>>> !DIExpression(DW_OP_bit_piece(32, 32)))
>>>>> >
>>>>> > On the upside, having all this information explicit could simplify
>>>>> the code in DwarfDebug::buildLocationList().
>>>>> >
>>>>> > Yeah, this is true. We could potentially extend the semantics by
>>>>> allowing separate key calls for pieces, i.e.
>>>>> >
>>>>> > %token = call llvm.dbg.value(token %undef, %i, !Struct,
>>>>> !DIExpression(DW_OP_bit_piece(0, 32)))
>>>>> >            call llvm.dbg.value(token undef, %j, !Struct,
>>>>> !DIExpression(DW_OP_bit_piece(32, 32)))
>>>>> >
>>>>> > ; This now only invalidates the .j part
>>>>> > %tok2 = call llvm.dbg.value(token %undef, %j, !Struct,
>>>>> !DIExpression(DW_OP_bit_piece(32, 32)))
>>>>> >
>>>>> > In that case we would probably have to require that all
>>>>> DW_OP_bit_pieces in non-key-call expressions are a subrange of those in the
>>>>> associated key call.
>>>>> >
>>>>> > This way all non-key-call additional locations are describing
>>>>> alternative locations for (a subset of) the bits described the key-call
>>>>> location. Makes sense, and again would simplify the backend’s work.
>>>>> >
>>>>> >
>>>>> > Is there any information in the tokens that could not be recovered
>>>>> by a static analysis of the debug intrinsics?
>>>>> > Note that having redundant information available explicitly is not
>>>>> necessarily a bad thing.
>>>>> >
>>>>> > I am not entirely sure what you are proposing. You somehow need to
>>>>> be able to encode which dbg.values invalidate previous locations and which
>>>>> do not. Since we're describing front-end variables this will generally
>>>>> depend on front-end semantics, so I'm not sure what a generic analysis pass
>>>>> can do here without requiring language-specific analysis.
>>>>> >
>>>>> > Right. Determining whether two locations have equivalent contents is
>>>>> not generally decidable.
>>>>> >
>>>>> > The one difference I noticed so far is that alternative locations
>>>>> allow earlier locations to outlive locations that are dominated by them:
>>>>> >   %loc = dbg.value(%undef, var, ...)
>>>>> >   ...
>>>>> >   %alt = dbg.value(%loc, var, ...)
>>>>> >   ...
>>>>> >   ; alt becomes unavailable
>>>>> >   ...
>>>>> >   ; %loc is still available here.
>>>>> >
>>>>> > Any other advantages that I missed?
>>>>> >
>>>>> > -- adrian
>>>>> >
>>>>> >
>>>>> > One thing I’m wondering about is whether we couldn’t design a
>>>>> friendlier (assembler) syntax for the three different use-cases:
>>>>> >   %tok1 = call llvm.dbg.value(token %undef, %1, !var, !())
>>>>> >   %tok2 = call llvm.dbg.value(token %token, %2, !var, !())
>>>>> >   %tok3 = call llvm.dbg.value(token %tok1, %undef, !var, !())
>>>>> >
>>>>> > Could be written as e.g.:
>>>>> >
>>>>> >   %tok1 = call llvm.dbg.value.new(%1, !var, !())
>>>>> >   %tok2 = call llvm.dbg.value.add(token %token, %2, !var, !())
>>>>> >   %tok3 = call llvm.dbg.value.delete(token %tok1, !var, !())
>>>>> >
>>>>> > -- adrian
>>>>> > _______________________________________________
>>>>> > LLVM Developers mailing list
>>>>> > llvm-dev at lists.llvm.org
>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/c46ba936/attachment.html>