[LLVMdev] Proposal: Debug information improvement - keep the line number with optimizations

Devang Patel dpatel at apple.com
Mon Feb 2 13:49:43 PST 2009


Hi Zhou,

There is certainly an interest to preserve line number information  
(and valid variable info) during optimizations in llvm. More info  
below...

On Feb 2, 2009, at 7:48 AM, Zhou Sheng wrote:

>

> The following sub-sections define specific requirements to improve  
> the debug information in LLVM.
>
>
> 2.1   Verification Flow
> The most important of this project is to make the debug information  
> do not block any optimization by LLVM transform passes. Here I  
> propose a way to determine whether codegen is being impacted by  
> debug info. This is also useful for us to scan the LLVM transform  
> pass list to find which pass need to update to work with debug  
> information.
>
> From Chris: Add a -strip-debug pass that removes all debug info from  
> the LLVM IR. Given this, it would allow us to do:
>        $ llvm-gcc -O3 -c -o - | llc > good.s
> $ llvm-gcc -O3 -c -g -o - | opt -strip-debug | llc > test.s
>        $ diff good.s test.s



>  If the two .s files differed, then badness happened.

This may not work perfectly because presence of debug info may  
influence compiler generated symbol names and label numbers.

> This obviously only catches badness that happens in the LLVM  
> optimizer,

There is an establish way to check this. See
	http://llvm.org/docs/SourceLevelDebugging.html#debugopt

> if the code generator is broken, we'll need something more  
> sophisticated that strips debug info out of the .s file.  In any  
> case, this is a good place to start, and should be turned into a  
> llvm-test TEST/report.
>
> Incidentally, we have to go through codegen, we can't diff .ll files  
> after debug info is stripped out. This is because debug info is  
> allowed to (and probably does) impact local names within functions,  
> but these functions are removed at codegen and are not important to  
> preserve. End
>
>
>
> 2.2   A Pass to clean up the debug info
> LLVM already has a transform pass "-strip-debug", it removes all the  
> debug information. But for the first half of this project, we want  
> to just keep the line number information (stop point) in the  
> optimized code. So we need a new transform pass to just removes the  
> variable declaration information.

FWIW, mem2reg already does this.

> Pass "-strip-debug" also doesn't cleanup the dead variable and  
> function calling for debug information, it thinks other pass like "- 
> dce" or "-globaldce" can handle this.

Yes.

> But as we are also going to update those passes, we can't use them  
> in the verification flow, otherwise, it may output incorrect check  
> results.

I am not sure, I follow this.

>
> The new pass "-strip-debug-pro" should have the following functions:
> 1.         Just remove the variable declaration information and  
> clean up the dead debug information.

This are two separate tasks.
	1) Remove variable declaration info.
		This is already done (indirectly) by mem2reg. But a separate pass to  
do so won't hurt either.
	2) Remove dead debug information.
		This is very useful as a separate pass and can be used while  
debugging non optimized code (for example, to remove type info for the  
types that are not used at all).

> 2.         Just remove the line number information and clean up the  
> dead debug information.

I am not sure what is the purpose of this ?

> 3.         Remove all the debug information and clean up.

That's what, "Remove Debug Info", -strip-debug does.

If you put -strip-debug + -dce in one pass then you're not comparing  
apple and apple in your 2.1 style verification. Or I am missing  
something.


>  2.3   Front End Changes
> For the first half of the project, we just aim to handle the line  
> number debug information. So we need to force llvm-gcc not to emit  
> any variable declaration information.
>
> 2.4   Optimization Transform Changes
> According to the output of the check script, we can get a pass-to- 
> update list. Just follow the list to update the pass one by one.
> When done a single pass, turn back to run the llvm/test and llvm- 
> test, note apply the pass "-strip-debug-pro" right after the updated  
> pass to see if it work correctly.
>
> 2.      Proposed Work Plan
> This section defines a proposed work plan to accomplish the  
> requirements that we desires. The work plan is broken into several  
> distinct phases that follow a logical progression of modifications  
> to the LLVM software.
>
> 2.1   Phase 1: Establish the testing system
> One of the most useful things to get started is to have some way to  
> determine whether codegen is being impacted by debug info.  It is  
> important to be able to tell when this happens so that we can track  
> down these places and fix them.
>
> 2.1.1    Pass Scanning Script
> Following the way proposed by Chris, it is good to have a script to  
> scan the standard LLVM transform pass list. We can get the standard  
> compile optimization pass list by:

You can use http://llvm.org/docs/SourceLevelDebugging.html#debugopt as  
a starting point here.
>
>        $ opt -std-compile-opts -debug-pass=Arguments foo.bc > /dev/ 
> null
> Pass Arguments:  -preverify -domtree -verify -lowersetjmp - 
> raiseallocs -simplifycfg -domtree -domfrontier -mem2reg -globalopt - 
> globaldce -ipconstprop -deadargelim -instcombine -simplifycfg - 
> basiccg -prune-eh -inline -argpromotion -tailduplicate -simplify- 
> libcalls -instcombine -jump-threading -simplifycfg -domtree - 
> domfrontier -scalarrepl -instcombine -break-crit-edges -condprop - 
> tailcallelim -simplifycfg -reassociate -domtree -loops -loopsimplify  
> -domfrontier -scalar-evolution -lcssa -loop-rotate -licm -lcssa - 
> loop-unswitch -scalar-evolution -lcssa -loop-index-split - 
> instcombine -scalar-evolution -domfrontier -lcssa -indvars - 
> domfrontier -scalar-evolution -lcssa -loop-unroll -instcombine - 
> domtree -memdep -gvn -memcpyopt -sccp -instcombine -break-crit-edges  
> -condprop -memdep -dse -mergereturn -postdomtree -postdomfrontier - 
> adce -simplifycfg -strip-dead-prototypes -printusedtypes - 
> deadtypeelim -constmerge -preverify -domtree -verify
>
>
>
> The script should look like:
> #!/bin/sh
>
> OPTS="-preverify -domtree -verify -lowersetjmp -raiseallocs - 
> simplifycfg -domtree -domfrontier -mem2reg -globalopt -globaldce - 
> ipconstprop -deadargelim -instcombine -simplifycfg -basiccg -prune- 
> eh -inline -argpromotion -tailduplicate -simplify-libcalls - 
> instcombine -jump-threading -simplifycfg -domtree -domfrontier - 
> scalarrepl -instcombine -break-crit-edges -condprop -tailcallelim - 
> simplifycfg -reassociate -domtree -loops -loopsimplify -domfrontier - 
> scalar-evolution -lcssa -loop-rotate -licm -lcssa -loop-unswitch - 
> scalar-evolution -lcssa -loop-index-split -instcombine -scalar- 
> evolution -domfrontier -lcssa -indvars -domfrontier -scalar- 
> evolution -lcssa -loop-unroll -instcombine -domtree -memdep -gvn - 
> memcpyopt -sccp -instcombine -break-crit-edges -condprop -memdep - 
> dse -mergereturn -postdomtree -postdomfrontier -adce -simplifycfg - 
> strip-dead-prototypes -printusedtypes -deadtypeelim -constmerge - 
> preverify -domtree -verify"
>
> llvm-gcc -g -emit-llvm -c $1 -o $1.db1.ll -S
> llvm-gcc -emit-llvm -c $1 -o good.bc
>
> sed '/call void @llvm.dbg.declare/d' $1.db1.ll > $1.db2.ll
>
> llvm-as $1.db2.ll -f
>
> for p in $OPTS; do
>   opt $p $1.db2.bc -o $1.db2.bc -f
>   opt -strip-debug -deadtypeelim -dce -globaldce -deadtypeelim  
> $1.db2.bc | llc > test.s -f
>   opt $p -strip-debug -deadtypeelim -dce -globaldce -deadtypeelim  
> good.bc -o good.bc -f
>   llc good.bc > good.s -f
>   echo "PASS $p : " >> diff.log
>   if `diff good.s test.s >> diff.log 2>&1 ` ; then
>       echo "PASS $p : SUCC"
>   else
>       echo "PASS $p : FAIL"
>   fi
> done
>
> For example:
> Foo.c:
> int foo(int x, int y) {
>   return x + y;
> }
>
> $ ./check.sh foo.c
> PASS -preverify : SUCC
> PASS -domtree : SUCC
> PASS -verify : SUCC
> PASS -lowersetjmp : SUCC
> PASS -raiseallocs : SUCC
> PASS -simplifycfg : SUCC
> PASS -domtree : SUCC
> PASS -domfrontier : SUCC
> PASS -mem2reg : FAIL
> PASS -globalopt : FAIL
> PASS -globaldce : FAIL
> PASS -ipconstprop : FAIL
> PASS -deadargelim : FAIL
> PASS -instcombine : FAIL
> PASS -simplifycfg : FAIL
>
> Check the log file:
> PASS -preverify :
> PASS -domtree :
> PASS -verify :
> PASS -lowersetjmp :
> PASS -raiseallocs :
> PASS -simplifycfg :
> PASS -domtree :
> PASS -domfrontier :
> PASS -mem2reg :
> 8,9c8,14
> <   movl    4(%esp), %eax
> <   addl    8(%esp), %eax
> ---
> >   subl    $8, %esp
> >   movl    12(%esp), %eax
> >   movl    %eax, 4(%esp)
> >   movl    16(%esp), %eax
> >   movl    %eax, (%esp)
> >   addl    4(%esp), %eax
> >   addl    $8, %esp
> For the above example, we found that the transform pass "mem2reg"  
> obviously not done the work when keeping the debug information. Then  
> we know we need to update it and re-test
>
>
> 2.1.2    Update the LLVM testing system
> The LLVM testing infrastructure contains two major categories of  
> tests: code fragments and whole programs. Code fragments are  
> referred to as the "DejaGNU tests" and are in the llvm module in  
> subversion under the llvm/test directory. The whole programs tests  
> are referred to as the "Test suite" and are in the test-suite module  
> in subversion.
> Scanning all the test cases, find those using the specified  
> transform and add the script similar to that previously mentioned.
> Make the result write into llvm-test TEST/report.
>
>
> 2.2   Phase 2: New Pass to Strip Debug Information
> LLVM already has a transform pass "-strip-debug", it removes all the  
> debug information. But for the first half of this project, we want  
> to just keep the line number information (stop point) in the  
> optimized code. So we need a new transform pass to just removes the  
> variable declaration information. Pass "-strip-debug" also doesn't  
> cleanup the dead variable and function calling for debug  
> information, it thinks other pass like "-dce" or "-globaldce" can  
> handle this. But as we are also going to update those passes, we  
> can't use them in the verification flow, otherwise, it may output  
> incorrect check results.
>
> The new pass "-strip-debug-pro" should have the following functions:
> 1.         Just remove the variable declaration information and  
> clean up the dead debug information.
>
> 2.         Remove all the debug information and clean up
>
> 3.2.1    Work Plan
> 1.         Take a reference to transform pass StripSymbol.cpp
> 2.         Based on the StripSymbol.cpp, add an option to it to just  
> remove debug information, like "-rm-debug"

That's what -strip-debug is doing.

> 3.         Add an option to just remove the variable declaration  
> information, like "–rm-debug=2"

Why not -strip-debug=2 if you want a way to remove variable  
declarations ..?

> 4.         Add a procedure to clean up the dead variables and  
> function calls for debug purpose.
>
> 2.3   Phase 3: Extend llvm-gcc
> Once we have a way to verify what is happening, I propose that we  
> aim for an intermediate point: instead of having -O disable all  
> debug info, we should make it disable just variable information, but  
> keep emitting line number info.  This would allow stepping through  
> the program, getting stack traces, use performance tools like shark,  
> etc.
>
> We need the front-end llvm-gcc to have a mode that causes it to emit  
> line number info but not
> variable info, we can go through the process above to identify  
> passes that change behavior when line number intrinsics are in the  
> code.
>
> 1.3.1    Work Plan
> 1.         First locate the file position that llvm-gcc handle the  
> parameter options.
> 2.         Add a new option to control the llvm-gcc to emit  
> specified debug information: like –g1. –g1 to only emit line number

> 3.         Building the new llvm-gcc
> 4.         Testing through llvm/test, llvm-test
>
> 2.4   Phase 4: Update Transform Passes for Line Number Info.
> When the front-end has a mode that causes it to emit line number  
> info but not variable info, we can go through the process above to  
> identify passes that change behavior when line number intrinsics are  
> in the code.

I think, the optimizer is not changing behavior when dbg info is  
present. Try running dbgopt tests.

>   Obvious cases are things like loop unroll and inlining: they  
> 'measure' the size of some code to determine whether to unroll it or  
> not. This means that it should be enhanced to ignore debug  
> intrinsics for the sake of code size estimation.

The loop unrolling pass already ignores the debug info! See  
LoopUnroll.cpp::ApproximateLoopSize()

>
> Another example is optimizations like SimplifyCFG when it merges if/ 
> then/else into select instructions. SimplifyCFG will have to be  
> enhanced to ignore debug intrinsics when doing its safety/ 
> profitability analysis,

I think, it handles this part well, but ...
> but then it will also have to be updated to just delete the line  
> number intrinsics when it does the xform. This is simplifycfg's way  
> of "updating" the debug info for this example transformation.

.. the second part has not received full attention.

> As we progress through various optimizations, we will find cases  
> where it is possible to update (e.g. loop unroll or inlining, which  
> doesn't have to do anything special to update line #'s) and places  
> where it isn't.  As long as the debug intrinsics don't affect  
> codegen, we are happy, even if the debug intrinsics are deleted in  
> cases where it would be possible to update them (this becomes a  
> optimized debugging QoI issue).
>
>
>
> 3.4.1 Work Plan
> 1.         Update transform pass mem2reg

> 2.         Testing through llvm/test, llvm-test
> 3.         Update transform pass simplifycfg
> 4.         Testing through llvm/test, llvm-test
> 5.         Likewise, update transform passes globalopt, globaldce,  
> ipconstprop, deadargelim, instcombine...
> 6.         Update other passes and testing them.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


I'm looking forward to your contributions in this area.
-
Devang







More information about the llvm-dev mailing list