[LLVMdev] Proposal: Debug information improvement - keep the line number with optimizations
Devang Patel
dpatel at apple.com
Mon Feb 2 13:49:43 PST 2009
Hi Zhou,
There is certainly an interest to preserve line number information
(and valid variable info) during optimizations in llvm. More info
below...
On Feb 2, 2009, at 7:48 AM, Zhou Sheng wrote:
>
> The following sub-sections define specific requirements to improve
> the debug information in LLVM.
>
>
> 2.1 Verification Flow
> The most important of this project is to make the debug information
> do not block any optimization by LLVM transform passes. Here I
> propose a way to determine whether codegen is being impacted by
> debug info. This is also useful for us to scan the LLVM transform
> pass list to find which pass need to update to work with debug
> information.
>
> From Chris: Add a -strip-debug pass that removes all debug info from
> the LLVM IR. Given this, it would allow us to do:
> $ llvm-gcc -O3 -c -o - | llc > good.s
> $ llvm-gcc -O3 -c -g -o - | opt -strip-debug | llc > test.s
> $ diff good.s test.s
> If the two .s files differed, then badness happened.
This may not work perfectly because presence of debug info may
influence compiler generated symbol names and label numbers.
> This obviously only catches badness that happens in the LLVM
> optimizer,
There is an establish way to check this. See
http://llvm.org/docs/SourceLevelDebugging.html#debugopt
> if the code generator is broken, we'll need something more
> sophisticated that strips debug info out of the .s file. In any
> case, this is a good place to start, and should be turned into a
> llvm-test TEST/report.
>
> Incidentally, we have to go through codegen, we can't diff .ll files
> after debug info is stripped out. This is because debug info is
> allowed to (and probably does) impact local names within functions,
> but these functions are removed at codegen and are not important to
> preserve. End
>
>
>
> 2.2 A Pass to clean up the debug info
> LLVM already has a transform pass "-strip-debug", it removes all the
> debug information. But for the first half of this project, we want
> to just keep the line number information (stop point) in the
> optimized code. So we need a new transform pass to just removes the
> variable declaration information.
FWIW, mem2reg already does this.
> Pass "-strip-debug" also doesn't cleanup the dead variable and
> function calling for debug information, it thinks other pass like "-
> dce" or "-globaldce" can handle this.
Yes.
> But as we are also going to update those passes, we can't use them
> in the verification flow, otherwise, it may output incorrect check
> results.
I am not sure, I follow this.
>
> The new pass "-strip-debug-pro" should have the following functions:
> 1. Just remove the variable declaration information and
> clean up the dead debug information.
This are two separate tasks.
1) Remove variable declaration info.
This is already done (indirectly) by mem2reg. But a separate pass to
do so won't hurt either.
2) Remove dead debug information.
This is very useful as a separate pass and can be used while
debugging non optimized code (for example, to remove type info for the
types that are not used at all).
> 2. Just remove the line number information and clean up the
> dead debug information.
I am not sure what is the purpose of this ?
> 3. Remove all the debug information and clean up.
That's what, "Remove Debug Info", -strip-debug does.
If you put -strip-debug + -dce in one pass then you're not comparing
apple and apple in your 2.1 style verification. Or I am missing
something.
> 2.3 Front End Changes
> For the first half of the project, we just aim to handle the line
> number debug information. So we need to force llvm-gcc not to emit
> any variable declaration information.
>
> 2.4 Optimization Transform Changes
> According to the output of the check script, we can get a pass-to-
> update list. Just follow the list to update the pass one by one.
> When done a single pass, turn back to run the llvm/test and llvm-
> test, note apply the pass "-strip-debug-pro" right after the updated
> pass to see if it work correctly.
>
> 2. Proposed Work Plan
> This section defines a proposed work plan to accomplish the
> requirements that we desires. The work plan is broken into several
> distinct phases that follow a logical progression of modifications
> to the LLVM software.
>
> 2.1 Phase 1: Establish the testing system
> One of the most useful things to get started is to have some way to
> determine whether codegen is being impacted by debug info. It is
> important to be able to tell when this happens so that we can track
> down these places and fix them.
>
> 2.1.1 Pass Scanning Script
> Following the way proposed by Chris, it is good to have a script to
> scan the standard LLVM transform pass list. We can get the standard
> compile optimization pass list by:
You can use http://llvm.org/docs/SourceLevelDebugging.html#debugopt as
a starting point here.
>
> $ opt -std-compile-opts -debug-pass=Arguments foo.bc > /dev/
> null
> Pass Arguments: -preverify -domtree -verify -lowersetjmp -
> raiseallocs -simplifycfg -domtree -domfrontier -mem2reg -globalopt -
> globaldce -ipconstprop -deadargelim -instcombine -simplifycfg -
> basiccg -prune-eh -inline -argpromotion -tailduplicate -simplify-
> libcalls -instcombine -jump-threading -simplifycfg -domtree -
> domfrontier -scalarrepl -instcombine -break-crit-edges -condprop -
> tailcallelim -simplifycfg -reassociate -domtree -loops -loopsimplify
> -domfrontier -scalar-evolution -lcssa -loop-rotate -licm -lcssa -
> loop-unswitch -scalar-evolution -lcssa -loop-index-split -
> instcombine -scalar-evolution -domfrontier -lcssa -indvars -
> domfrontier -scalar-evolution -lcssa -loop-unroll -instcombine -
> domtree -memdep -gvn -memcpyopt -sccp -instcombine -break-crit-edges
> -condprop -memdep -dse -mergereturn -postdomtree -postdomfrontier -
> adce -simplifycfg -strip-dead-prototypes -printusedtypes -
> deadtypeelim -constmerge -preverify -domtree -verify
>
>
>
> The script should look like:
> #!/bin/sh
>
> OPTS="-preverify -domtree -verify -lowersetjmp -raiseallocs -
> simplifycfg -domtree -domfrontier -mem2reg -globalopt -globaldce -
> ipconstprop -deadargelim -instcombine -simplifycfg -basiccg -prune-
> eh -inline -argpromotion -tailduplicate -simplify-libcalls -
> instcombine -jump-threading -simplifycfg -domtree -domfrontier -
> scalarrepl -instcombine -break-crit-edges -condprop -tailcallelim -
> simplifycfg -reassociate -domtree -loops -loopsimplify -domfrontier -
> scalar-evolution -lcssa -loop-rotate -licm -lcssa -loop-unswitch -
> scalar-evolution -lcssa -loop-index-split -instcombine -scalar-
> evolution -domfrontier -lcssa -indvars -domfrontier -scalar-
> evolution -lcssa -loop-unroll -instcombine -domtree -memdep -gvn -
> memcpyopt -sccp -instcombine -break-crit-edges -condprop -memdep -
> dse -mergereturn -postdomtree -postdomfrontier -adce -simplifycfg -
> strip-dead-prototypes -printusedtypes -deadtypeelim -constmerge -
> preverify -domtree -verify"
>
> llvm-gcc -g -emit-llvm -c $1 -o $1.db1.ll -S
> llvm-gcc -emit-llvm -c $1 -o good.bc
>
> sed '/call void @llvm.dbg.declare/d' $1.db1.ll > $1.db2.ll
>
> llvm-as $1.db2.ll -f
>
> for p in $OPTS; do
> opt $p $1.db2.bc -o $1.db2.bc -f
> opt -strip-debug -deadtypeelim -dce -globaldce -deadtypeelim
> $1.db2.bc | llc > test.s -f
> opt $p -strip-debug -deadtypeelim -dce -globaldce -deadtypeelim
> good.bc -o good.bc -f
> llc good.bc > good.s -f
> echo "PASS $p : " >> diff.log
> if `diff good.s test.s >> diff.log 2>&1 ` ; then
> echo "PASS $p : SUCC"
> else
> echo "PASS $p : FAIL"
> fi
> done
>
> For example:
> Foo.c:
> int foo(int x, int y) {
> return x + y;
> }
>
> $ ./check.sh foo.c
> PASS -preverify : SUCC
> PASS -domtree : SUCC
> PASS -verify : SUCC
> PASS -lowersetjmp : SUCC
> PASS -raiseallocs : SUCC
> PASS -simplifycfg : SUCC
> PASS -domtree : SUCC
> PASS -domfrontier : SUCC
> PASS -mem2reg : FAIL
> PASS -globalopt : FAIL
> PASS -globaldce : FAIL
> PASS -ipconstprop : FAIL
> PASS -deadargelim : FAIL
> PASS -instcombine : FAIL
> PASS -simplifycfg : FAIL
>
> Check the log file:
> PASS -preverify :
> PASS -domtree :
> PASS -verify :
> PASS -lowersetjmp :
> PASS -raiseallocs :
> PASS -simplifycfg :
> PASS -domtree :
> PASS -domfrontier :
> PASS -mem2reg :
> 8,9c8,14
> < movl 4(%esp), %eax
> < addl 8(%esp), %eax
> ---
> > subl $8, %esp
> > movl 12(%esp), %eax
> > movl %eax, 4(%esp)
> > movl 16(%esp), %eax
> > movl %eax, (%esp)
> > addl 4(%esp), %eax
> > addl $8, %esp
> For the above example, we found that the transform pass "mem2reg"
> obviously not done the work when keeping the debug information. Then
> we know we need to update it and re-test
>
>
> 2.1.2 Update the LLVM testing system
> The LLVM testing infrastructure contains two major categories of
> tests: code fragments and whole programs. Code fragments are
> referred to as the "DejaGNU tests" and are in the llvm module in
> subversion under the llvm/test directory. The whole programs tests
> are referred to as the "Test suite" and are in the test-suite module
> in subversion.
> Scanning all the test cases, find those using the specified
> transform and add the script similar to that previously mentioned.
> Make the result write into llvm-test TEST/report.
>
>
> 2.2 Phase 2: New Pass to Strip Debug Information
> LLVM already has a transform pass "-strip-debug", it removes all the
> debug information. But for the first half of this project, we want
> to just keep the line number information (stop point) in the
> optimized code. So we need a new transform pass to just removes the
> variable declaration information. Pass "-strip-debug" also doesn't
> cleanup the dead variable and function calling for debug
> information, it thinks other pass like "-dce" or "-globaldce" can
> handle this. But as we are also going to update those passes, we
> can't use them in the verification flow, otherwise, it may output
> incorrect check results.
>
> The new pass "-strip-debug-pro" should have the following functions:
> 1. Just remove the variable declaration information and
> clean up the dead debug information.
>
> 2. Remove all the debug information and clean up
>
> 3.2.1 Work Plan
> 1. Take a reference to transform pass StripSymbol.cpp
> 2. Based on the StripSymbol.cpp, add an option to it to just
> remove debug information, like "-rm-debug"
That's what -strip-debug is doing.
> 3. Add an option to just remove the variable declaration
> information, like "–rm-debug=2"
Why not -strip-debug=2 if you want a way to remove variable
declarations ..?
> 4. Add a procedure to clean up the dead variables and
> function calls for debug purpose.
>
> 2.3 Phase 3: Extend llvm-gcc
> Once we have a way to verify what is happening, I propose that we
> aim for an intermediate point: instead of having -O disable all
> debug info, we should make it disable just variable information, but
> keep emitting line number info. This would allow stepping through
> the program, getting stack traces, use performance tools like shark,
> etc.
>
> We need the front-end llvm-gcc to have a mode that causes it to emit
> line number info but not
> variable info, we can go through the process above to identify
> passes that change behavior when line number intrinsics are in the
> code.
>
> 1.3.1 Work Plan
> 1. First locate the file position that llvm-gcc handle the
> parameter options.
> 2. Add a new option to control the llvm-gcc to emit
> specified debug information: like –g1. –g1 to only emit line number
> 3. Building the new llvm-gcc
> 4. Testing through llvm/test, llvm-test
>
> 2.4 Phase 4: Update Transform Passes for Line Number Info.
> When the front-end has a mode that causes it to emit line number
> info but not variable info, we can go through the process above to
> identify passes that change behavior when line number intrinsics are
> in the code.
I think, the optimizer is not changing behavior when dbg info is
present. Try running dbgopt tests.
> Obvious cases are things like loop unroll and inlining: they
> 'measure' the size of some code to determine whether to unroll it or
> not. This means that it should be enhanced to ignore debug
> intrinsics for the sake of code size estimation.
The loop unrolling pass already ignores the debug info! See
LoopUnroll.cpp::ApproximateLoopSize()
>
> Another example is optimizations like SimplifyCFG when it merges if/
> then/else into select instructions. SimplifyCFG will have to be
> enhanced to ignore debug intrinsics when doing its safety/
> profitability analysis,
I think, it handles this part well, but ...
> but then it will also have to be updated to just delete the line
> number intrinsics when it does the xform. This is simplifycfg's way
> of "updating" the debug info for this example transformation.
.. the second part has not received full attention.
> As we progress through various optimizations, we will find cases
> where it is possible to update (e.g. loop unroll or inlining, which
> doesn't have to do anything special to update line #'s) and places
> where it isn't. As long as the debug intrinsics don't affect
> codegen, we are happy, even if the debug intrinsics are deleted in
> cases where it would be possible to update them (this becomes a
> optimized debugging QoI issue).
>
>
>
> 3.4.1 Work Plan
> 1. Update transform pass mem2reg
> 2. Testing through llvm/test, llvm-test
> 3. Update transform pass simplifycfg
> 4. Testing through llvm/test, llvm-test
> 5. Likewise, update transform passes globalopt, globaldce,
> ipconstprop, deadargelim, instcombine...
> 6. Update other passes and testing them.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
I'm looking forward to your contributions in this area.
-
Devang
More information about the llvm-dev
mailing list