[llvm-dev] Range lists, zero-length functions, linker gc

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Fri May 29 14:42:05 PDT 2020


On Fri, May 29, 2020 at 2:32 PM Alexey Lapshin
<alapshin at accesssoftek.com> wrote:
>
> >>There still exist another problem:
>
> >> DWARF4: "A range list entry (but not a base address selection or end of list entry) whose beginning and
> >>ending addresses are equal has no effect because the size of the range covered by such
> >>an entry is zero."
>
> >>DWARF5: "A bounded range entry whose beginning and ending address offsets are equal
> >>(including zero) indicates an empty range and may be ignored."
>
> >>These rules allow us to ignore zero-length address ranges. I.e., some tool reading DWARF is permitted to ignore related DWARF entries.
>
>
> >I agree it allows consumers to ignore that entry in the range list because that entry is zero-length/equivalent to not being present at all - I don't think that >means consumers can ignore the DIE that refers to this range list. I think it's valid DWARF to have a CU that only describes types, without any code attached >to it at all. Or for a subprogram that's been eliminated to still be used by a consumer for name lookup purposes - so the consumer can tell the user there is a >function called "f1" and tell the user what parameter types, return type it has, etc - not ignore it entirely.
>
> Probably it relies on interpretation. And then it would be good to clarify that question in the DWARF standard.
> I think there is a difference when CU does not relate to any address. And when it relates to invalid
> address(deleted code). Probably, these two situations should be handled differently:
>
> 1. CU that only describes types without any code attached to it should not be ignored by the tools.
> 2. CU that relates to the deleted code could be removed/ignored by the tools.

I think it's probably best not to handle those two cases differently -
because there's no way to know if (2) has some types that might be
useful even if the code is eliminated, in the same way that (1) has
types that might be useful.

That said, I don't think the current state of affairs - of linkers
implementing the "usual unix linker-y semantics" dropping some object
files & thus dropping their associated debug info. One can probably
create cases where, even within that semantic (without the extra lld
feature that was especially problematic) some strange things happen
that might not be ideal - but I think I'd have to see those cases
before I'd worry too much about that situation.



> >> In that case, there could be ignored essential descriptions. That problem could happen with -flto=thin
> >> example https://reviews.llvm.org/D54747#1503720 . In this example, all type definitions except one were
> >> replaced with declarations by thinlto. The definition, which was left, is in a piece of debug info related to
> >> deleted code. According to zero-length rule, that definition could be ignored, and finally, incomplete debug
> >> info could be used.
>
>
> > Yeah, I think the bug there is the linker dropping object files just because they have no exxecutable
> > code in them - I think the patch that did that was reverted, if I'm remembering correctly.
>
> Right. The patch was reverted. But that problem is actual for any tool which tries to remove debug
> info related to garbage collected code. For example, that problem exists for dsymutil:
>
> $ cat a.cpp
> int f();
> int main() {
>   return f();
> }
>
> $ cat b.cpp
> struct Foo {
>   int x, y;
> };
> int f() {
>   volatile Foo var;
>   var.x = 13;
>   var.y = 42;
>   return var.x + var.y;
> }
>
> $ clang++ a.cpp b.cpp -O -g -flto=thin -Wl,-dead_strip
> $ dsymutil a.out
> $ llvm-dwarfdump -a a.out.dSYM/Contents/Resources/DWARF/a.out | grep Foo
>    DW_AT_type (0x00000000000000b1 "volatile Foo")
>    DW_AT_type (0x00000000000000b6 "Foo")
>    DW_AT_name ("Foo")   <<<<<<<<<<<<<<<<<< that is a declaration(definition is removed)
> 0x000000af: "Foo"

Honestly, ThinLTO is a weird beast when it comes to linker semantics
and debug info & could probably do with a bunch of tweaking, perhaps
especially in the overlap with the Apple/MachO linking semantics. I'd
say there's room for improvement there, but I don't know that that
improvement requires any word from the DWARF committee/changes to the
DWARF standard.

> i.e. Probably we need to clarify that question in the standard: whether it is allowed to
> remove/ignore DIEs related to deleted code.
> So that tools(dsymutil/DWARF aware linker) correctly handle such situations.
>
> If it would be necessary to analyze debug info related to deleted code
> (whether it contains something used in other parts of debug info)
> then the linking process will become even more slow.

Yep - I'd rather avoid that & probably address such issues by changing
how ThinLTO works, to ensure it doesn't drop type information (there
are other solutions here other than changing linker semantics -
ThinLTO could be changed to produce different object files to begin
with - importing/exporting types to different files to ensure they are
preserved into the final linked binary)

> It would be better to not allow to generate such closely-coupled debug info.

That would come at the cost of increasing duplication, potentially,
which I'd like to avoid.

- Dave


More information about the llvm-dev mailing list