[lldb-dev] Prologue instructions having line information

Fri Sep 22 16:03:15 PDT 2017

> On Sep 14, 2017, at 3:32 PM, Jim Ingham <jingham at apple.com> wrote:
> 
> This is supported (admittedly a little awkwardly) in DWARF with the DW_TAG_inline_subroutine DIE's in the  debug_info section of the DWARF.  They can expresses the nesting fully.  
> 

Sorry for the delay in responding.

For now, I don’t have anything interesting to say about how
lldb keeps track of which logical function it’s in when 
a single instruction maps to multiple functions.  Jim’s comments
seem logical.  For all the places in lldb that convert an
instruction location into a function those would be places to look
at the code and think about doing something more fancy.

I did have some thoughts on the range list encoding.

What follows is not a recommendation, it's just random
musings.

range lists...
They seem to take up more space than they need to.

So for a single inline instance you need: (assuming 64
bits obviously)

2x64 base address
2x64 end of list 

Plus 2x64 for every continuous block of instructions.
Assuming the usual case of inlining is mixed with
code scheduling and other optimizations, a single
inlined instance is likely to be split up pretty 
thoroughly. (This is all logical guesses, I haven't
gathered any statistics.)

So if you've got 10 instruction split into 5 chunks
then you need (1+5+1) * (64*2) which is 112 bytes.

Since most inlined functions are very small
the overhead of 4 address words just to open
up a range list seems pretty inefficient.

You're effectively painting a boolean property
on a subset of the instructions in a function.
With one property per inlined instance.

In the old Sun days we had a desire to do this
for a couple of different kinds of information so
we invested in a fairly concise way to record it.

We just made a sorted list of the instruction offsets
from the start of the containing function.  Then encoded
the sorted list of numbers differentially and stored
it as a list of ULEBs in a raw data block attribute attached
to a die that describes the property we're describing.
It was used for describing ctor and dtor code blocks
among other things.

So if you have these function offsets of instructions:

1001 1002 1005 1010 1030

You end up with a list of LEBs like:

1001 1 3 5 20    (the last 4 numbers are the
                  differences between adjacent pairs)

And and since LEBs are fairly short when they are small
it doesn't take much space.  So for this example I think
it would be just 6 bytes.

If you had 10 instructions like I used above, then
you'd have 10 bytes plus a few extra bytes for the 
larger offsets.

It's much shorter if you can do without address-sized
fields, and it's easier on the linker to avoid 
lots of relocations.  But the down side is that
using relocations and offsets allows you to generate
the section data using assembly syntax without teaching
your assembler how to generate special dwarf data.
So that may be why dwarf has relied a little too heavily on 
addresses and relocations for some things.  But that's
just guess.

Now I noticed there was an issue that Paul Robinson 
mentioned on the intertubes where lldb prefers 
fixed-size dies.  The solution in the Studio compilers
puts the data block inside the die, but it works
just as well to factor the data out into a different
section the same way range lists work. It's a little 
less space efficient because of the address pointing
at the external section, but still better than range lists.