[llvm-dev] Range lists, zero-length functions, linker gc

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Fri May 29 13:21:34 PDT 2020


On Fri, May 29, 2020 at 12:52 PM Robinson, Paul <paul.robinson at sony.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Alexey Lapshin <alapshin at accesssoftek.com>
> > Sent: Friday, May 29, 2020 3:09 PM
> > To: Robinson, Paul <paul.robinson at sony.com>; Fangrui Song
> > <maskray at google.com>; David Blaikie <dblaikie at gmail.com>
> > Cc: Sriraman Tallam <tmsriram at google.com>; Wei Mi <wmi at google.com>; Adrian
> > Prantl <aprantl at apple.com>; Jonas Devlieghere <jdevlieghere at apple.com>;
> > Alexey Lapshin <a.v.lapshin at mail.ru>; Eric Christopher
> > <echristo at gmail.com>; peter.smith at arm.com; George Rimar
> > <grimar at accesssoftek.com>; llvm-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc
> >
> >
> > > Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc
> > >
> > > On 2020-05-28, David Blaikie wrote:
> > > >On Thu, May 28, 2020 at 2:52 PM Robinson, Paul <paul.robinson at sony.com>
> > > >wrote:
> > > >
> > > >> As has been mentioned elsewhere, Sony generally fixes up references
> > > from
> > > >> debug info to stripped functions (of any length) using -1, because
> > > that's a
> > > >> less-likely-to-be-real address than 0x0 or 0x1.  (0x0 is a typical
> > base
> > > >> address for shared libraries, I'd think using it has the potential to
> > > >> mislead various consumers.)  For .debug_ranges we use -2, because
> > both
> > > a
> > > >> 0/0 pair and a -1/-1 pair have a reserved meaning in that section.
> > > >>
> > > >
> > > >Any harm in using -2 everywhere, for consistency?
> > >
> > > When resolving a relocation, in certain cases we have to give an
> > undefined
> > > symbol a value.
> > > This can happen with:
> > >
> > > * an undefined weak symbol
> > > * an undefined global symbol in --noinhibit-exec mode (a buggy --gc-
> > > sections implementation can trigger this as well)
> > > * a relocation referencing an undefined symbol in a non-SHF_ALLOC
> > section
> > >
> > > We always respect the addend in a relocation entry for an absolute/PC-
> > > relative (I can use "most" here)
> > > relocation (R_ARM_THM_PC8, R_AARCH64_ADR_PREL_PG_HI21, R_X86_64_64,
> > > local exec TLS relocation types, ...)
> > > Ignoring the addend (using -2 everywhere) will break this consistency.
> > >
> > > The relocated code may do pointer subtraction which would work if
> > addends
> > > were
> > > respected, but will break using -2 everywhere.
> >
> > >I suspect David meant "any harm to using -2 in all .debug_* sections?"
> > >and not literally everywhere.  Sony does special cases only for the
> > >.debug_* sections.
> >
> > >I've been meaning to propose that DWARF v6 reserve a special address for
> > >this kind of situation.  Whether the committee would be willing to make
> > >it be -1 or -2 for all targets, or make it target-defined, I don't know.
> >
> > >(Dreading the inevitable argument over whether addresses are signed or
> > >unsigned, or more to the point whether they wrap.  They've been unsigned
> > >and wrapping was undefined on the small set of machines I'm familiar
> > with.)
> > >Certainly the toolchain community would benefit from making it be the
> > >same everywhere.
> >
> > >Personally I'd vote for -1, and make pre-v5 .debug_loc/.debug_ranges
> > >sections be an extra-special case using -2.  We can (I hope) standardize
> > >on -1 for v6 onward, and document -1/-2 on the DWARF wiki as recommended
> > >practice for prior versions.
> >
> > Would it make sense to use "LowPC > HighPC" in DWARF documentation as a
> > sign for that
> > case, instead of -1 or -2 ?
> >
> > Or more correct: To indicate that address range points into deleted code
> > there should be used either zero length, either LowPC>HighPc range ?
> >
> > zero length address range is already defined in DWARF documentation.
> > LowPC>HighPc is currently not described. It could be documented and
> > used as general representation instead of concrete special value.
> >
> > Implementation could still use -2 for resolving relocations and it would
> > satisfy above definition.
> >
> > Thank you, Alexey.
>
> For addresses that are part of a range, that sounds reasonable.

I think it'd still be tricky to work with even just considering ranges
for a few reasons:

* Ranges described in split DWARF by low_pc(address),
high_pc(data/length) - the high_pc can't be fixed up.
* Ranges that aren't at the start of a section - eg: ("void f1() { }
nodebug void f2() { } void f3() { }" - without function sections, f3
will start and end at some offset relative to the base address of the
.text section - this means, for instance, that the low/high_pc of f3,
let's say low_pc was a relocatable address and high_pc was length and
f3 gets linker-gc'd, now the base address resolves to -2, but -2 +
offset (OK, I'm stretching a bit here, we aren't doing this yet - but
see my thread back in February or so, when I discussed the idea of
using base address+offset to reduce the number of relocations/size of
the address pool) wraps around and becomes <

>
> Addresses are not always part of a range, however. I can think of
> two cases where they do not:  DW_TAG_label points to a single
> instruction, not a range; and the .debug_line section doesn't really
> identify ranges, at least not directly.  I still think we'd want to
> specify a reserved value.

the new/3rd case here is DW_TAG_call_site, that uses DW_AT_call_pc to
identify the call instruction.

But, yeah - I think a blessed value, that taints any address
computation you may do based on it (so it requires explicit support in
the debugger so that it doesn't casually wrap-around "max - 1 (aka
"-2")" back to a positive value just because you do "addr+offset" on
it in a DWARF expression or form, etc).


More information about the llvm-dev mailing list