[clang] [llvm] Add option to generate additional debug info for expression dereferencing pointer to pointers. (PR #81545)

Mon Apr 1 17:06:19 PDT 2024

huangjd wrote:

> > > Reading LLVM IR lit CHECK lines from clang codegen is a bit difficult - could you include some simple examples (perhaps from the new clang tests in this patch) showing the DWARF output just as comments in this review for something more easily glanceable?
> > 
> > 
> > Attached is the output of the following command
> > `clang ~/llvm-project/clang/test/CodeGenCXX/debug-info-ptr-to-ptr.cpp -fdebug-info-for-pointer-type -g2 -S -O3 -o /tmp/debug-info-ptr-to-ptr.txt`
> > [debug-info-ptr-to-ptr.txt](https://github.com/llvm/llvm-project/files/14659111/debug-info-ptr-to-ptr.txt)
> 
> Thanks - OK, so this only applies to intermediate structures and arrays (is it useful for arrays? You can't really reorder them - learning that the hot part of an array is the 5th-10th element might be of limited (or at least sufficiently different from the struct layout stuff) value - and it's a more dynamic property/has a runtime parameter component that might be harder to use?)
> 
> Do you have size impact numbers for this? I wouldn't put this under a (possibly cc1) flag for now unless we've got some pretty compelling data that this doesn't substantially change debug info size, which would be a bit surprising to me - I'd assume this would be quite expensive, but it's just a guess.

For clarification,
LLVM previously supported: 
`int foo(A* a) { return a->i; }`  In this case the type of variable `a` is emitted, and for the instruction `mov 8(%rdi), %rax` (assume offset is 8), the debug info emitted for this instruction associates `%rdi` to `a`, and by looking up the data layout in A's type info, we know which field is accessed.

This patch handles two cases that are previously not supported.
1. `int foo(void* a) { return ((A*)a)->i; }` Previously only the debug info of void pointer type `a` is emitted, and it is still associated to `%rdi`, so we couldn't deduce what is being accessed from that instruction. In this patch, it emits a pseudo variable in addition, which is also associated to `%rdi`, and it has the correct type info when traversing the member expression. 

2. `int foo(B* b) { return b->a->i; }` Previously only the debug info of `b` is emitted, but not the intermediate value, so for the second `mov` instruction emitted, it could not associate the memory operand to any variable. In this patch it emits a pseudo variable for intermediate values if it is used as the pointer operand in a member expr. 

It should apply to array, if the expression actually ends up in an instruction like `mov 8(%rdi), %rax`. I have test cases for it, and assembly dump also shows the memory operand is correlated to the pseudo variable. Note that in most use cases the presence of array is actually irrelevant because we are not type casting the array element itself ( `((A&)foo[i]).member`, that's generally invalid), instead we type cast the pointer (`((A*) foo[i])->member`) in this case whatever being type casted doesn't matter because it's case 1.

As for impact, I believe @namhyung did some measurement for building the Linux kernel, and it does not have a significant impact. 

https://github.com/llvm/llvm-project/pull/81545