[llvm-dev] Proposal for multi location debug info support in LLVM IR

Wed Jan 6 13:58:54 PST 2016

> On Jan 5, 2016, at 10:37 AM, Keno Fischer <kfischer at college.harvard.edu> wrote:
> 
> On Tue, Jan 5, 2016 at 6:59 PM, Adrian Prantl <aprantl at apple.com <mailto:aprantl at apple.com>> wrote:
> Thanks for the clarification, Paul!
> Keno, just a few more questions for my understanding:
> 
> >     - Indicating that a value changed at source level (e.g. because an
> >       assignment occurred)
> 
> This is done by a key call.
> 
> Correct
>  
> >     - Indicating that the same value is now available in a new location
> 
> Additional, alternative locations with identical contents are added by passing in the token from a key call.
> 
> Correct
>  
> >     - Indicating that a value is no longer available in some location
> 
> This is done by another key call (possibly with an %undef location).
> 
> Not quite. Another key call could be used if all locations are now invalid. However, to just remove a single value, I was proposing
> 
> ; This is the key call
> %first = call token @llvm.dbg.value(token undef, %someloc,
>                                   metadata !var, metadata !())
> 
> ; This adds a location
> %second = call token @llvm.dbg.value(token %second, %someotherloc,
>                                   metadata !var, metadata !())
> 
> ; This removes the (%second) location
> %third = call token @llvm.dbg.value(token %second, metadata token undef,
>                                   metadata !var, metadata !())
> 
> Thus, to remove a location you always pass in the token of the call that added the location. This is also the reason why I'm requiring the second argument to be `token undef` because no valid location can be of type token, and I wanted to avoid the situation in which a location gets replaced by undef everywhere, accidentally turning into a removal of the location specified by the key call

Makes sense. If I understand your comment correctly, the following snippet:

%1 = ...
%token = call llvm.dbg.value(token %undef, %1, !var, !())
%2 = ...
call llvm.dbg.value(token %token, %undef, !var, !())
call llvm.dbg.value(token %undef, %2, !var, !())

is equivalent to

%1 = ...
call llvm.dbg.value(token %undef, %1, !var, !())
%2 = ...
call llvm.dbg.value(token %undef, %2, !var, !())

and both are legal.

> > > >
> > > >     - To add a location with the same value for the same variable, you
> > > pass the
> > > >       token of the FIRST llvm.dbg.value, as this llvm.dbg.value's first
> > > argument
> > > >       E.g. to add another location for the variable above:
> > > >
> > > >         %second =3D call token @llvm.dbg.value(token %first, metadata
> > > %val2,
> > > >                                             metadata !var, metadata
> > > !expr2)
> > >
> > > Does this invalidate the first location, or does this add an additional
> > > location
> > > to the set of locations for var at this point? If I want to add a third
> > > location,
> > > which token do I pass in? Can you explain a bit more what information the
> > > token
> > > allows us to express that is currently not possible?
> > >
> >
> > It adds a second location. If you want to add a third location you pass in
> > the first token again.
> > Thus the first call (key call) indicates a change of values, and all
> > locations that have the same value should use the key call's token.
> >
> 
> Ok. Looks like this is going to be somewhat verbose for partial updates of SROA’ed aggregates as in the following example:
> 
> // struct s { int i, j };
> // void foo(struct s) { s.j = 0; ... }
> 
> define void @foo(i32 %i, i32 %j) {
>   %token = call llvm.dbg.value(token %undef, %i, !Struct, !DIExpression(DW_OP_bit_piece(0, 32)))
>            call llvm.dbg.value(token %token, %j, !Struct, !DIExpression(DW_OP_bit_piece(32, 32)))
>   ...
> 
>   ; have to repeat %i here:
>   %tok2 = call llvm.dbg.value(token %undef, %i, !Struct, !DIExpression(DW_OP_bit_piece(0, 32)))
>           call llvm.dbg.value(token %tok2, metadata i32 0, !Struct, !DIExpression(DW_OP_bit_piece(32, 32)))
> 
> On the upside, having all this information explicit could simplify the code in DwarfDebug::buildLocationList().
> 
> Yeah, this is true. We could potentially extend the semantics by allowing separate key calls for pieces, i.e.
>  
> %token = call llvm.dbg.value(token %undef, %i, !Struct, !DIExpression(DW_OP_bit_piece(0, 32)))
>            call llvm.dbg.value(token undef, %j, !Struct, !DIExpression(DW_OP_bit_piece(32, 32)))
> 
> ; This now only invalidates the .j part
> %tok2 = call llvm.dbg.value(token %undef, %j, !Struct, !DIExpression(DW_OP_bit_piece(32, 32)))
> 
> In that case we would probably have to require that all DW_OP_bit_pieces in non-key-call expressions are a subrange of those in the associated key call.

This way all non-key-call additional locations are describing alternative locations for (a subset of) the bits described the key-call location. Makes sense, and again would simplify the backend’s work.

> 
> Is there any information in the tokens that could not be recovered by a static analysis of the debug intrinsics?
> Note that having redundant information available explicitly is not necessarily a bad thing.
> 
> I am not entirely sure what you are proposing. You somehow need to be able to encode which dbg.values invalidate previous locations and which do not. Since we're describing front-end variables this will generally depend on front-end semantics, so I'm not sure what a generic analysis pass can do here without requiring language-specific analysis.

Right. Determining whether two locations have equivalent contents is not generally decidable.

> The one difference I noticed so far is that alternative locations allow earlier locations to outlive locations that are dominated by them:
>   %loc = dbg.value(%undef, var, ...)
>   ...
>   %alt = dbg.value(%loc, var, ...)
>   ...
>   ; alt becomes unavailable
>   ...
>   ; %loc is still available here.
> 
> Any other advantages that I missed?
> 
> -- adrian

One thing I’m wondering about is whether we couldn’t design a friendlier (assembler) syntax for the three different use-cases:
  %tok1 = call llvm.dbg.value(token %undef, %1, !var, !())
  %tok2 = call llvm.dbg.value(token %token, %2, !var, !())
  %tok3 = call llvm.dbg.value(token %tok1, %undef, !var, !())

Could be written as e.g.:

  %tok1 = call llvm.dbg.value.new(%1, !var, !())
  %tok2 = call llvm.dbg.value.add(token %token, %2, !var, !())
  %tok3 = call llvm.dbg.value.delete(token %tok1, !var, !())

-- adrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160106/19273a48/attachment.html>