[PATCH] D59553: [LLD][ELF][DebugInfo] llvm-symbolizer shows incorrect source line info if --gc-sections used

Mon May 6 09:31:43 PDT 2019

avl added a comment.

> I've some previous discussions PR37212 PR37891 etc. -2 or -1 may be benign to some non-allocated sections like .debug.*, but may not be good to others.

Could you explicitly describe that case, please ? i.e. When using UINT64_MAX-1 as start of address range for relocations pointing to deleted code would not be good for others ?

> The consumers may get used to treat 0 DW_AT_low_pc special but the patch may make some consumers code more special values.

that would be a error. Treating 0 DW_AT_low_pc as a special value would not be correct because there is no such documented special value. Consumers should not rely that linker resolves the relocation to a removed section as a zero.

this patch does not add additional special value. consumer code would not be required to add more special values.

> lld doesn't implement the CB_PRETEND logic in GNU linkers:
> 
>   gold CB_PRETEND https://sourceware.org/git/?p=binutils-gdb.git;a=blob;hb=9d6d4be89d12747a92629ed1bde1d423e2831de1;f=gold/target-reloc.h#l414
>   ld.bfd action_discarded PRETEND https://sourceware.org/git/?p=binutils-gdb.git;a=blob;hb=9d6d4be89d12747a92629ed1bde1d423e2831de1;f=bfd/elflink.c#l10851
> 
> Maybe you can raise an issue on https://sourceware.org/bugzilla , asking if the relocation can be resolved to -2 if the PRETEND logic fails, or asking if addr2line should be fixed instead. I raised some issues in the weekend and the >maintainers seem super responsive.

PRETEND case is a related but separate issue. PRETEND is about "assign address range of selected COMDAT section to the references to deleted COMDAT section".
This patch is about - "assign not-clashing address value to the references for the deleted code". 
I can raise an issue there but I do not think this discussion should depend on https://sourceware.org/bugzilla.

> I think it is quite possible that we just need the llvm-symbolizer fix D60470 <https://reviews.llvm.org/D60470> and an addr2line fix to make everything work out. Debuggers have had the heuristics for years.

I do not agree with this. 
llvm-symbolizer and addr2line just do not have enough information to properly select correct address range.

for the platform where image-base starts from 0 - D60470 <https://reviews.llvm.org/D60470> will not help to select correct address range. Right ?

I think there are three levels of solutions for this problem :

1. Best one: Delete wrong debug info. In this case the problem just would not arise.
2. Good enough: Set not clashing address range for code pointing to deleted sections while resolving relocations in linker. linker knows which address range is correct. So it could unambiguously assign proper value.
3. Less powerful: implement heuristics in  llvm-symbolizer and addr2line.  heuristics do not work for all cases. Thus wrong behavior would still exists.

>> But there are other platforms/compilation languages/linking options/linkers where this problem happens more often.

> If you happen to know one combination, please elaborate :)

sure.

1. x86 platform and -image-base=0 makes this problem to happen much more often.
2. there are quite many platforms which already has -image-base starting from zero. This is often used for embedded systems. as an example - please check message from @jhenderson https://reviews.llvm.org/D60470#1462375 . In such case this problem happens much more often.

3. there are programming languages not using exceptions intensively. So they do not create eh_frame section at all or have it very small. This would lead to not shift start of text section much (as in your explanation why the problem is less important for lld).

> The image base doesn't matter (is zero) for -shared/-pie. I've shared my thoughts why this doesn't likely cause problems to lld. 
>  In lld, the PF_R PT_LOAD is placed before the PF_R|PF_X PT_LOAD (text segment). If you have a compilation unit that is sufficiently 
>  large (large DW_AT_high_pc, 0 DW_AT_low_pc), it is reasonable to have a sufficiently large PF_R PT_LOAD. 
>  Then the range will not collide with a good compilation unit with large DW_AT_low_pc.

> On the GNU linkers' side, ld.bfd -z separate-code is a recent addition and it doesn't move the large .rodata etc before .text. 
>  gold hasn't implemented -z separate-code. They are more vulnerable to the issue.

I do not really understand why we are discussing that question? 
Even if that problem is less important for lld than for others linkers - it still exists.
So we need to patch lld anyway.

Nevertheless above description based on the fact that :

> " If you have a compilation unit that is sufficiently 
>  large (large DW_AT_high_pc, 0 DW_AT_low_pc), it is reasonable to have a sufficiently large PF_R PT_LOAD. 
>  Then the range will not collide with a good compilation unit with large DW_AT_low_pc."

taking into account that :

ld sets default image base to 0x400000 for x86_64.
gold default image base is 0x400000 for x86_64.
lld default image base is 0x200000 for x86_64

1. size of sections put before .text section by lld should be greater than  0x200000 to have a benefit comparing with other linkers. In case size of sections would be less than 0x200000 then lld would have a problem but other linkers would not.

2. Actually it is not correct to link size of .text section and size of previously put sections as in above assumption. Size of text sections should be compared with size of deleted code. if there were deleted big piece of data then broken range could be big - 0x0-0x300000 f.e; But correct code could be smaller(size of text section is 1000 f.e) -  in this case there would be overlapping address ranges in-spite of the fact that lld put some sections before .text section.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59553/new/

https://reviews.llvm.org/D59553