[lld] [llvm] [Symbolizer] Support for Missing Line Numbers. (PR #82240)

via llvm-commits llvm-commits at lists.llvm.org
Thu May 16 10:08:57 PDT 2024


================
@@ -0,0 +1,261 @@
+# REQUIRES: x86-registered-target
+
+# RUN: clang -O3 -gline-tables-only -T%S/linker-script.ld --target=x86_64-pc-linux %s -o %t.o
----------------
ampandey-1995 wrote:

> I hope that we don't need compilation or linking. 
I am also not in favour of compilation or linking but to cover that sequence crossing should not happen we need linker to invoked.

Let me explain this in detail.
The assembly(containing two sequences) whether handwritten or whether generated has addresses of sequences mapped at offset zero always by llvm-mc. 

I checked the address of sequences by running below commands on approximate-line-handcrafted.s(Note No .loc entries are edited here.)
 
```llvm-mc -g -filetype=obj -triple=x86_64-pc-linux approximate-line-handcrafted.s -o approximate-line-handcrafted.o``` 

```llvm-dwarfdump --debug-line approximate-line-handcrafted.o``` outputs this line table:-

```
Address            Line   Column File   ISA Discriminator OpIndex Flags
------------------ ------ ------ ------ --- ------------- ------- -------------
**0x0000000000000000**      1     78      1   0             0       0  is_stmt prologue_end
0x0000000000000006      1     78      1   0             0       0  is_stmt end_sequence
**0x0000000000000000**      2     49      1   0             0       0  is_stmt prologue_end
0x0000000000000003      2     39      1   0             0       0
0x0000000000000010      3      0      1   0             0       0  is_stmt
0x0000000000000012      3     49      1   0             0       0  is_stmt prologue_end
0x0000000000000014      3     39      1   0             0       0
0x0000000000000020      4      0      0   0             0       0  is_stmt
0x0000000000000021      7      2      0   0             0       0  is_stmt prologue_end
0x0000000000000034      8      2      0   0             0       0  is_stmt
0x0000000000000047      9      2      0   0             0       0  is_stmt
0x000000000000005a     10      3      0   0             0       0  is_stmt
0x000000000000005c     10      3      0   0             0       0  epilogue_begin
0x000000000000005e     10      3      0   0             0       0  end_sequence
```
Now in above o/p you see that two sequences have there offset at 0x0000000000000000. When I try to run llvm-symbolizer on those address I get weird behaviour like below.

```
~$ llvm-symbolizer --obj=approximate-line-handcrafted.o 0x04
.L.str
/tmp/test/./definitions.h:2:39

~$ llvm-symbolizer --obj=approximate-line-handcrafted.o 0x1
.L.str
/tmp/test/./definitions.h:2:49
```

This happens when we have two sequences in object file but their mapping are offset at 0x00. In approximate-line-generated.s this was not happening because it has single sequence. If multiple sequences are there in object then llvm-mc dosen't offset the sections as no memory map is happening. This job is for the linker. So, to cover this part I had to invoke the linker with the dummy linker script(which is static for all targets). The linker script provides memory mapping by offsetting sections at a particular offset(Here in approximate-line-handcrafted.s at  0x500000).

So with this below command:-
```
clang -O3 -T linker-script.ld -gline-tables-only --target=x86_64-pc-linux approximate-line-handcrafted.s -o approximate-line-handcrafted.o

llvm-dwarfdump --debug-line approximate-line-handcrafted.o

**Output Line table**
Address            Line   Column File   ISA Discriminator OpIndex Flags
------------------ ------ ------ ------ --- ------------- ------- -------------
0x0000000000500790      1     78      1   0             0       0  is_stmt prologue_end
0x0000000000500796      1     78      1   0             0       0  is_stmt end_sequence
0x00000000005000f0      2     49      1   0             0       0  is_stmt prologue_end
0x00000000005000f3      2     39      1   0             0       0
0x0000000000500100      3      0      1   0             0       0  is_stmt
0x0000000000500102      3     49      1   0             0       0  is_stmt prologue_end
0x0000000000500104      3     39      1   0             0       0
0x0000000000500110      4      0      0   0             0       0  is_stmt
0x0000000000500111      7      2      0   0             0       0  is_stmt prologue_end
0x0000000000500124      8      2      0   0             0       0  is_stmt
0x0000000000500137      9      2      0   0             0       0  is_stmt
0x000000000050014a     10      3      0   0             0       0  is_stmt
0x000000000050014c     10      3      0   0             0       0  epilogue_begin
0x000000000050014e     10      3      0   0             0       0  end_sequence

~$ llvm-symbolizer --obj=approximate-line-handcrafted.o 0x5000f3
add
/tmp/test/./definitions.h:2:39

~$ llvm-symbolizer --obj=approximate-line-handcrafted.o 0x500790
dummy_function
/tmp/test/./definitions.h:1:78

```   

Which is great. I can easily modify here some of the line table entries with value zero here and invoke llvm-symbolizer on unique addresses to cover that sequence boundaries should not happen.   

All I am saying is invoking ```llvm-mc``` dosen't work with multiple sequences as default offset for every sequence starts at 0x00. If there is a possibility of any instruction  x86_64 which can set offset correctly for sequences(I didn't find any), then I could make this work wihout hassle of calling linker.


https://github.com/llvm/llvm-project/pull/82240


More information about the llvm-commits mailing list