[llvm-dev] Range lists, zero-length functions, linker gc

Mon Jun 1 15:03:00 PDT 2020

>>>I agree it allows consumers to ignore that entry in the range list because that entry is zero-length/equivalent 
>>>to not being present at all - I don't think that >means consumers can ignore the DIE that refers to this range list. 
>>>I think it's valid DWARF to have a CU that only describes types, without any code attached >to it at all. Or for a 
>>>subprogram that's been eliminated to still be used by a consumer for name lookup purposes - so the consumer 
>>>can tell the user there is a >function called "f1" and tell the user what parameter types, return type it has, etc 
>>>- not ignore it entirely.
>>>
>> Probably it relies on interpretation. And then it would be good to clarify that question in the DWARF standard.
>> I think there is a difference when CU does not relate to any address. And when it relates to invalid
>> address(deleted code). Probably, these two situations should be handled differently:
>>
>> 1. CU that only describes types without any code attached to it should not be ignored by the tools.
>> 2. CU that relates to the deleted code could be removed/ignored by the tools.

David>I think it's probably best not to handle those two cases differently -
David>because there's no way to know if (2) has some types that might be
David>useful even if the code is eliminated, in the same way that (1) has
David>types that might be useful.

David>That said, I don't think the current state of affairs - of linkers
David>implementing the "usual unix linker-y semantics" dropping some object
David>files & thus dropping their associated debug info. One can probably
David>create cases where, even within that semantic (without the extra lld
David>feature that was especially problematic) some strange things happen
David>that might not be ideal - but I think I'd have to see those cases
David>before I'd worry too much about that situation.

I agree, currently, "there's no way to know if (2) has some types that might be
useful even if the code is eliminated, in the same way that (1) has
types that might be useful". Though this is a point that could probably be changed.

The problem is that optimizing closely-coupled debug info is a more complex
and time-consuming task. It would probably be possible to achieve the same 
debug info size but with a simpler structure, and, as a result - quicker debug info 
optimizing process.

Having something like this rule: "CU that relates to the deleted code could be 
removed/ignored by the tools." - would motivate DWARF producers to generate 
DWARF with a particular structure. So that debug info stays consistent even 
if some CU would be removed. Without such agreement - nothing prevents from 
producing closely-coupled debug info.

>>>> In that case, there could be ignored essential descriptions. That problem could happen with -flto=thin
>>>> example https://reviews.llvm.org/D54747#1503720 . In this example, all type definitions except one were
>>>> replaced with declarations by thinlto. The definition, which was left, is in a piece of debug info related to
>>>> deleted code. According to zero-length rule, that definition could be ignored, and finally, incomplete debug
>>>> info could be used.
>>
>>
>>> Yeah, I think the bug there is the linker dropping object files just because they have no exxecutable
>>> code in them - I think the patch that did that was reverted, if I'm remembering correctly.
>>
>> Right. The patch was reverted. But that problem is actual for any tool which tries to remove debug
>> info related to garbage collected code. For example, that problem exists for dsymutil:
>>
>> $ cat a.cpp
>> int f();
>> int main() {
>>   return f();
>> }
>>
>> $ cat b.cpp
>> struct Foo {
>>   int x, y;
>> };
>> int f() {
>>   volatile Foo var;
>>   var.x = 13;
>>   var.y = 42;
>>   return var.x + var.y;
>> }
>>
>> $ clang++ a.cpp b.cpp -O -g -flto=thin -Wl,-dead_strip
>> $ dsymutil a.out
>> $ llvm-dwarfdump -a a.out.dSYM/Contents/Resources/DWARF/a.out | grep Foo
>>    DW_AT_type (0x00000000000000b1 "volatile Foo")
>>    DW_AT_type (0x00000000000000b6 "Foo")
>>    DW_AT_name ("Foo")   <<<<<<<<<<<<<<<<<< that is a declaration(definition is removed)
>> 0x000000af: "Foo"

David>Honestly, ThinLTO is a weird beast when it comes to linker semantics
David>and debug info & could probably do with a bunch of tweaking, perhaps
David>especially in the overlap with the Apple/MachO linking semantics. I'd
David>say there's room for improvement there, but I don't know that that
David>improvement requires any word from the DWARF committee/changes to the
David>DWARF standard.

>> i.e. Probably we need to clarify that question in the standard: whether it is allowed to
>> remove/ignore DIEs related to deleted code.
>> So that tools(dsymutil/DWARF aware linker) correctly handle such situations.
>>
>> If it would be necessary to analyze debug info related to deleted code
>> (whether it contains something used in other parts of debug info)
>> then the linking process will become even more slow.

David>Yep - I'd rather avoid that & probably address such issues by changing
David>how ThinLTO works, to ensure it doesn't drop type information (there
David>are other solutions here other than changing linker semantics -
David>ThinLTO could be changed to produce different object files to begin
David>with - importing/exporting types to different files to ensure they are
David>preserved into the final linked binary)

>> It would be better to not allow to generate such closely-coupled debug info.

David>That would come at the cost of increasing duplication, potentially,
David> which I'd like to avoid.

Probably it would not. As you mentioned in the previous comment - It is possible 
to change how DWARF producer works. It could produce different DWARF 
structures avoiding duplication from the one side and keeping DWARF structure 
simple from the other side - so that it could be easily parsed by DWARF optimizing tool
(DWARF aware linker).

Thank you, Alexey.