[LLVMdev] Get precise line/column debug info from LLVM IR

Wed Apr 22 18:20:23 PDT 2015

Try upgrading :)

dzur:~/tmp> ~/builds/build-llvm/Debug+Asserts/bin/clang -g -S -emit-llvm -o
- foo.c | grep "\!22"
  call void @llvm.dbg.declare(metadata i32* %f, metadata !21, metadata
!13), !dbg !22
  store i32 %add, i32* %f, align 4, !dbg !22
!22 = !MDLocation(line: 5, column: 12, scope: !4)

On Wed, Apr 22, 2015 at 6:13 PM Pablo González de Aledo <
pablo.aledo at gmail.com> wrote:

> I am trying to locate instructions in an LLVM Pass by line and column
> number (reported by an third-party tool) to instrument them. To achieve
> this, I am compiling my source files with `clang -g -O0 -emit-llvm` and
> looking for the information in the metadata using this code:
>
>     const DebugLoc &location = instruction->getDebugLoc();
>     // location.getLine()
>     // location.getCol()
>
> Unfortunately, this information is absolutely imprecise. Consider the
> following implementation of the Fibonacci function:
>
>     unsigned fib(unsigned n) {
>         if (n < 2)
>             return n;
>
>         unsigned f = fib(n - 1) + fib(n - 2);
>         return f;
>     }
>
> I would like to locate the single LLVM instruction corresponding to the
> assignment `unsigned f = ...` in the resulting LLVM IR. I am not interested
> in all the calculations of the right-hand side. The generated LLVM block
> including relevant debug metadata is:
>
>     [...]
>
>     if.end:                                           ; preds = %entry
>       call void @llvm.dbg.declare(metadata !{i32* %f}, metadata !17), !dbg
> !18
>       %2 = load i32* %n.addr, align 4, !dbg !19
>       %sub = sub i32 %2, 1, !dbg !19
>       %call = call i32 @fib(i32 %sub), !dbg !19
>       %3 = load i32* %n.addr, align 4, !dbg !20
>       %sub1 = sub i32 %3, 2, !dbg !20
>       %call2 = call i32 @fib(i32 %sub1), !dbg !20
>       %add = add i32 %call, %call2, !dbg !20
>       store i32 %add, i32* %f, align 4, !dbg !20
>       %4 = load i32* %f, align 4, !dbg !21
>       store i32 %4, i32* %retval, !dbg !21
>       br label %return, !dbg !21
>
>     [...]
>
>     !17 = metadata !{i32 786688, metadata !4, metadata !"f", metadata !5,
> i32 5, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [f] [line 5]
>     !18 = metadata !{i32 5, i32 11, metadata !4, null}
>     !19 = metadata !{i32 5, i32 15, metadata !4, null}
>     !20 = metadata !{i32 5, i32 28, metadata !4, null}
>     !21 = metadata !{i32 6, i32 2, metadata !4, null}
>     !22 = metadata !{i32 7, i32 1, metadata !4, null}
>
> As you can see, the metadata `!dbg !20` of the `store` instruction points
> to **line 5 column 28**, which is the call to `fib(n - 2)`. Even worse, the
> add operation and the subtraction `n - 2` both also point to that function
> call, identified by `!dbg !20`.
>
> Interestingly, the Clang AST emitted by `clang -Xclang -ast-dump
> -fsyntax-only` has all that information. Thus, I suspect that it is somehow
> lost during the code generation phase. It seems that during code generation
> Clang reaches some internal sequence point and associates all following
> instructions to that position until the next sequence point (e.g. function
> call) occurs. For completeness, here is the declaration statement in the
> AST:
>
>     |-DeclStmt 0x7ffec3869f48 <line:5:2, col:38>
>     | `-VarDecl 0x7ffec382d680 <col:2, col:37> col:11 used f 'unsigned
> int' cinit
>     |   `-BinaryOperator 0x7ffec3869f20 <col:15, col:37> 'unsigned int' '+'
>     |     |-CallExpr 0x7ffec382d7e0 <col:15, col:24> 'unsigned int'
>     |     | |-ImplicitCastExpr 0x7ffec382d7c8 <col:15> 'unsigned int
> (*)(unsigned int)' <FunctionToPointerDecay>
>     |     | | `-DeclRefExpr 0x7ffec382d6d8 <col:15> 'unsigned int
> (unsigned int)' Function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)'
>     |     | `-BinaryOperator 0x7ffec382d778 <col:19, col:23> 'unsigned
> int' '-'
>     |     |   |-ImplicitCastExpr 0x7ffec382d748 <col:19> 'unsigned int'
> <LValueToRValue>
>     |     |   | `-DeclRefExpr 0x7ffec382d700 <col:19> 'unsigned int'
> lvalue ParmVar 0x7ffec382d3d0 'n' 'unsigned int'
>     |     |   `-ImplicitCastExpr 0x7ffec382d760 <col:23> 'unsigned int'
> <IntegralCast>
>     |     |     `-IntegerLiteral 0x7ffec382d728 <col:23> 'int' 1
>     |     `-CallExpr 0x7ffec3869ef0 <col:28, col:37> 'unsigned int'
>     |       |-ImplicitCastExpr 0x7ffec3869ed8 <col:28> 'unsigned int
> (*)(unsigned int)' <FunctionToPointerDecay>
>     |       | `-DeclRefExpr 0x7ffec3869e10 <col:28> 'unsigned int
> (unsigned int)' Function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)'
>     |       `-BinaryOperator 0x7ffec3869eb0 <col:32, col:36> 'unsigned
> int' '-'
>     |         |-ImplicitCastExpr 0x7ffec3869e80 <col:32> 'unsigned int'
> <LValueToRValue>
>     |         | `-DeclRefExpr 0x7ffec3869e38 <col:32> 'unsigned int'
> lvalue ParmVar 0x7ffec382d3d0 'n' 'unsigned int'
>     |         `-ImplicitCastExpr 0x7ffec3869e98 <col:36> 'unsigned int'
> <IntegralCast>
>     |           `-IntegerLiteral 0x7ffec3869e60 <col:36> 'int' 2
>
> Is it either possible to improve the accuracy of the debug metadata, or
> resolve the corresponding instruction in a different way? Ideally, I would
> like to leave Clang untouched, i.e. not modify and recompile it.
>  _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150423/21f897c9/attachment.html>