[llvm-dev] Proposal for multi location debug info support in LLVM IR

Mon Jan 4 12:45:33 PST 2016

Address ranges in a location list may overlap (section 2.6.2).  The entries in a location list are not a range list (which is defined by section 2.17).
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Keno Fischer via llvm-dev
Sent: Monday, January 04, 2016 12:11 PM
To: Adrian Prantl
Cc: llvm-dev
Subject: Re: [llvm-dev] Proposal for multi location debug info support in LLVM IR

Thanks for your comments. Replies inline.

The DWARF 5 standard says that
"Address range entries in a range list may not overlap.”

The reasoning behind this is presumably that if a variable is in more than one
location at a point all the values need to be identical, or the information is useless

Oh huh, for some reason I was under the impression that they could. No matter, all we would have to do then is choose one in the backend. I think it makes sense to maintain the notion of separate multiple locations until then.

>
>     - To add a location with the same value for the same variable, you pass the
>       token of the FIRST llvm.dbg.value, as this llvm.dbg.value's first argument
>       E.g. to add another location for the variable above:
>
>         %second = call token @llvm.dbg.value(token %first, metadata %val2,
>                                             metadata !var, metadata !expr2)

Does this invalidate the first location, or does this add an additional location
to the set of locations for var at this point? If I want to add a third location,
which token do I pass in? Can you explain a bit more what information the token
allows us to express that is currently not possible?

It adds a second location. If you want to add a third location you pass in the first token again.
Thus the first call (key call) indicates a change of values, and all locations that have the same value should use the key call's token.

>
>     - To indicate that a location will no longer hold a value, you do the
>       following:
>
>         call token @llvm.dbg.value(token %second, metadata token undef,
>                                   metadata !var, metadata !())
>
>     - The current set of locations for a variable at a given instruction are all
>       those llvm.dbg.value instructions that dominate this location (
>       equivalently all those llvm.dbg.value calls whose token you could use at
>       that location without upsetting the Verifier), except that if more than
>       one key call is dominating, only the most recent one and all calls
>       associated to it by first argument count.
>
> I think that should encapsulate the semantics, but here are some consequences
> of and comments on the above that I think would be useful to discuss:
>
>     - The upgrade path for existing IR is very simple and just consists of
>       adding token undef as the first argument to any call in the IR.
>
>     - In general, if a value gets removed by an optimization, the corresponding
>       llvm.dbg.value call can be removed, unless that call is a key call, in
>       which case the value should be undefed out. This is necessary both to be
>       able to keep it around as the first argument to the other calls, and more
>       importantly to mark the end point of a previous set of locations.

So if %val is optimized out in the following example:

  %first = call token @llvm.dbg.value(token undef, metadata %val,
                                      metadata !var, metadata !expr)
  ...
  %second = call token @llvm.dbg.value(token %first, metadata %val2,
                                       metadata !var, metadata !expr2)

Does this turns into:

  call token @llvm.dbg.value(token undef, metadata %undef,
                             metadata !var, metadata !expr)
  %second = call token @llvm.dbg.value(token %undef, metadata %val2,
                                       metadata !var, metadata !expr2)

Or do we still have a %first token, or does the key call get removed entirely, because
the second one is now a key call?

I think the situation is the following:
If %second is the only use of %first, we can do that optimization. If not and %second dominates all uses of first, we could also do this optimization and replace all uses of %first with %second. However, we cannot remove the actual first key call, because it denotes the end location for the previous value of the same variable. Two exceptions I could think of are if %first is the first call for that variable in the function (as then there can not be a previous range to terminate) or if there are no other calls or memory operations in between %first and %second, in which case we could hoist %second up and merge the two calls. Does that make sense?

>
>     - I think llvm.dbg.declare can be deprecated and it's uses replaced by
>       llvm.dbg.value with an DW_OP_deref. That would also clarify the semantics
>       of the operation which have caused some confusion in the past.
I think we could already remove it today without any loss of generality (by
lifting any dbg.value whose first argument is an alloca into the MMI table).
What I see this proposal adding is a way to mark the end of a range, which
is important when a value is on the stack only for part of the function (as
in the stack coloring example).

Agreed!

>
>     - We may want to add an extra pass that does debug info inference (some of
>       which is done in InstCombine right now)

What kind of inference does InstCombine do currently?

I was thinking of replacing llvm.dbg.declare by appropriate llvm.dbg.value at each load/store.
In the new design that would essentially be an inference pass which would add those as
locations, with the original one only removed if the alloca actually gets lifted into registers.

>
> Here are some of the invariants, the verifier would enforce (included in the
> hope that they can clarify anything in the above):
>
>     1. If the first argument is not token undef, then
>         a. If the second argument is not token undef,
>             I. the first argument must be a call to llvm.dbg.value whose first
>                argument is token undef
>         b. If the second argument is token undef
>             II.  the first argument must be a call to llvm.dbg.value whose second
>                  argument is not token undef
>             III. the expression argument must be empty
>         c. In either case, the variable described must be the same as the one
>            described by the call that is the first argument.
>         d. There may not be another call to llvm.dbg.value with token undef
>            that dominates this instruction, is not the one passed as the first
>            argument and is dominated by the one passed as the first argument.
>     2. All other invariants regarding calls to llvm.dbg.value carry over
>        unchanged
>

-- adrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160104/477d13cd/attachment.html>