[LLVMdev] Proposal: Debug information improvement - keep the line number with optimizations

Mon Feb 2 07:48:24 PST 2009

Hi,

I've been thinking about how to keep the line number with the llvm
transform/Analysis passes.
Basically, I agree with Chris's notes (
http://www.nondot.org/sabre/LLVMNotes/DebugInfoImprovements.txt),  and I
will follow his way to turn on the line number information when optimization
enabled.

Here is a detailed proposal:

1.      Introduction

At the time of this writing, LLVM's DWARF debug info generation works
reasonably well at -O0, but it is completely disabled at optimization levels
-O1 and higher.  This is because our debug info representation interferes
with optimizations, transparently disabling them in cases where they would
not update it correctly.  This is useful for preserving correct debug info,
but it is not what people expect when they use 'llvm-gcc -O3 -g foo.c'.
(From Chris Lattner)

 ...

 ...

This document describes a path forward that will get us to a place where
turning on debug info does not pessimize code, and still preserves the
invariant that we don't produce bogus debug info.

*1.1.    **Goals*

The goals for this project are to:

1.         Enable optimization when line number info is turned on.

2.         Do not generate incorrect/bogus line number info

The goals of this proposal are to:

1.         Clearly state the requirements.

2.         Identify a work plan that will satisfy those requirements.

3.         Estimate time, schedule and cost for the work plan.

*1.2.    **Resources, Tools and Methods*

This work will be accomplished using the following:

1.         Intel(R) Xeon(R) CPU E5420 2.50GHz hardware

2.         Linux 2.6 Kernel (Fedora Core 6 or Cent OS 5)

3.         GCC 4.1 compiler

4.         C++ programming language

5.         LLVM, RELEASE 2.3

6.         LLVM-GCC4.2 RELEASE 2.3

The method for performing this work will use the LLVM project's open source
development policies, which include:

1.         Incremental development. A progression of small changes to the
LLVM code base will be made. To incrementally move LLVM in the desired
direction, this generally means adding features and testing them before
removing the functionality they are intended to replace. This approach
maintains compatibility with previous designs until new designs are provably
correct.

2.         Validation with test cases. With each incremental change,
llvm/test and llvm-test test suites should be run through to (a) prove the
new functionality, (b) expose potential weaknesses in the implementation.
Additionally, sets of unit test cases should be developed for each new
feature.

3.         Milestone Validation. Each milestone will be recognized as
completed when the associated set of test cases functions correctly.

4.         Open Source with peer review (Optional). For each patch, submit
to LLVM commit-list via email for review and comment. This is for
contribution back to LLVM in the future.

5.         Incremental documentation. As new features are added to LLVM, the
documentation will be updated at the same time.

6.         Bugzilla Tracking.

2.      Requirements

As LLVM optimization passes will change the original input source code a
lot, it is not a trivial work to keep the debug information in the optimized
code. The point is we should make sure the debug information in the
optimized code is totally correct. For now, I think there is no absolutely
solution for this project. A reasonable scheme is to keep the correct debug
info, if the debug info leaves incorrect after optimization, just remove it.
It does not generate silently broken information. (From Chris)

This is a long project, and will take quite a bit of work in all areas
before we can declare "success", but it is worthwhile, and important and
useful steps can be made without solving the whole problem. This proposal
should solve half of this problem. That is to keep the line number
information with optimization code. (From
Chris)<http://www.nondot.org/sabre/LLVMNotes/EmbeddedMetadata.txt>

The following sub-sections define specific requirements to improve the debug
information in LLVM.

*2.1  ** Verification Flow*

The most important of this project is to make the debug information do not
block any optimization by LLVM transform passes. Here I propose a way to
determine whether codegen is being impacted by debug info. This is also
useful for us to scan the LLVM transform pass list to find which pass need
to update to work with debug information.

*From Chris:* Add a -strip-debug pass that removes all debug info from the
LLVM IR. Given this, it would allow us to do:

       $ llvm-gcc -O3 -c -o - | llc > good.s

$ llvm-gcc -O3 -c -g -o - | opt -strip-debug | llc > test.s

       $ diff good.s test.s

If the two .s files differed, then badness happened.  This obviously only
catches badness that happens in the LLVM optimizer, if the code generator is
broken, we'll need something more sophisticated that strips debug info out
of the .s file.  In any case, this is a good place to start, and should be
turned into a llvm-test TEST/report.

Incidentally, we have to go through codegen, we can't diff .ll files after
debug info is stripped out. This is because debug info is allowed to (and
probably does) impact local names within functions, but these functions are
removed at codegen and are not important to preserve. *End*

*2.2  * *A Pass to clean up the debug info*

LLVM already has a transform pass "-strip-debug", it removes all the debug
information. But for the first half of this project, we want to just keep
the line number information (stop point) in the optimized code. So we need a
new transform pass to just removes the variable declaration information.
Pass "-strip-debug" also doesn't cleanup the dead variable and function
calling for debug information, it thinks other pass like "-dce" or
"-globaldce" can handle this. But as we are also going to update those
passes, we can't use them in the verification flow, otherwise, it may output
incorrect check results.

The new pass "-strip-debug-pro" should have the following functions:

1.         Just remove the variable declaration information and clean up the
dead debug information.

2.         Just remove the line number information and clean up the dead
debug information.

3.         Remove all the debug information and clean up.

*2.3  ** Front End Changes*

For the first half of the project, we just aim to handle the line number
debug information. So we need to force llvm-gcc not to emit any variable
declaration information.

*2.4  ** Optimization Transform Changes*

According to the output of the check script, we can get a pass-to-update
list. Just follow the list to update the pass one by one.

When done a single pass, turn back to run the llvm/test and llvm-test, note
apply the pass "-strip-debug-pro" right after the updated pass to see if it
work correctly.

2.      Proposed Work Plan

This section defines a proposed work plan to accomplish the requirements
that we desires. The work plan is broken into several distinct phases that
follow a logical progression of modifications to the LLVM software.

*2.1  ** Phase 1: Establish the testing system*

One of the most useful things to get started is to have some way to
determine whether codegen is being impacted by debug info.  It is important
to be able to tell when this happens so that we can track down these places
and fix them.

*2.1.1    **Pass Scanning Script*

Following the way proposed by Chris, it is good to have a script to scan the
standard LLVM transform pass list. We can get the standard compile
optimization pass list by:

       $ opt -std-compile-opts -debug-pass=Arguments foo.bc > /dev/null

Pass Arguments:  -preverify -domtree -verify -lowersetjmp -raiseallocs
-simplifycfg -domtree -domfrontier -mem2reg -globalopt -globaldce
-ipconstprop -deadargelim -instcombine -simplifycfg -basiccg -prune-eh
-inline -argpromotion -tailduplicate -simplify-libcalls -instcombine
-jump-threading -simplifycfg -domtree -domfrontier -scalarrepl -instcombine
-break-crit-edges -condprop -tailcallelim -simplifycfg -reassociate -domtree
-loops -loopsimplify -domfrontier -scalar-evolution -lcssa -loop-rotate
-licm -lcssa -loop-unswitch -scalar-evolution -lcssa -loop-index-split
-instcombine -scalar-evolution -domfrontier -lcssa -indvars -domfrontier
-scalar-evolution -lcssa -loop-unroll -instcombine -domtree -memdep -gvn
-memcpyopt -sccp -instcombine -break-crit-edges -condprop -memdep -dse
-mergereturn -postdomtree -postdomfrontier -adce -simplifycfg
-strip-dead-prototypes -printusedtypes -deadtypeelim -constmerge -preverify
-domtree -verify

The script should look like:

#!/bin/sh

OPTS="-preverify -domtree -verify -lowersetjmp -raiseallocs -simplifycfg
-domtree -domfrontier -mem2reg -globalopt -globaldce -ipconstprop
-deadargelim -instcombine -simplifycfg -basiccg -prune-eh -inline
-argpromotion -tailduplicate -simplify-libcalls -instcombine -jump-threading
-simplifycfg -domtree -domfrontier -scalarrepl -instcombine
-break-crit-edges -condprop -tailcallelim -simplifycfg -reassociate -domtree
-loops -loopsimplify -domfrontier -scalar-evolution -lcssa -loop-rotate
-licm -lcssa -loop-unswitch -scalar-evolution -lcssa -loop-index-split
-instcombine -scalar-evolution -domfrontier -lcssa -indvars -domfrontier
-scalar-evolution -lcssa -loop-unroll -instcombine -domtree -memdep -gvn
-memcpyopt -sccp -instcombine -break-crit-edges -condprop -memdep -dse
-mergereturn -postdomtree -postdomfrontier -adce -simplifycfg
-strip-dead-prototypes -printusedtypes -deadtypeelim -constmerge -preverify
-domtree -verify"

llvm-gcc -g -emit-llvm -c $1 -o $1.db1.ll -S

llvm-gcc -emit-llvm -c $1 -o good.bc

sed '/call void @llvm.dbg.declare/d' $1.db1.ll > $1.db2.ll

llvm-as $1.db2.ll -f

for p in $OPTS; do

  opt $p $1.db2.bc -o $1.db2.bc -f

  opt -strip-debug -deadtypeelim -dce -globaldce -deadtypeelim $1.db2.bc |
llc > test.s -f

  opt $p -strip-debug -deadtypeelim -dce -globaldce -deadtypeelim good.bc -o
good.bc -f

  llc good.bc > good.s -f

  echo "PASS $p : " >> diff.log

  if `diff good.s test.s >> diff.log 2>&1 ` ; then

      echo "PASS $p : SUCC"

  else

      echo "PASS $p : FAIL"

  fi

done

For example:

Foo.c:

int foo(int x, int y) {

  return x + y;

}

$ ./check.sh foo.c

PASS -preverify : SUCC

PASS -domtree : SUCC

PASS -verify : SUCC

PASS -lowersetjmp : SUCC

PASS -raiseallocs : SUCC

PASS -simplifycfg : SUCC

PASS -domtree : SUCC

PASS -domfrontier : SUCC

PASS -mem2reg : FAIL

PASS -globalopt : FAIL

PASS -globaldce : FAIL

PASS -ipconstprop : FAIL

PASS -deadargelim : FAIL

PASS -instcombine : FAIL

PASS -simplifycfg : FAIL

Check the log file:

PASS -preverify :

PASS -domtree :

PASS -verify :

PASS -lowersetjmp :

PASS -raiseallocs :

PASS -simplifycfg :

PASS -domtree :

PASS -domfrontier :

PASS -mem2reg :

8,9c8,14

<   movl    4(%esp), %eax

<   addl    8(%esp), %eax

---

>   subl    $8, %esp

>   movl    12(%esp), %eax

>   movl    %eax, 4(%esp)

>   movl    16(%esp), %eax

>   movl    %eax, (%esp)

>   addl    4(%esp), %eax

>   addl    $8, %esp

For the above example, we found that the transform pass "mem2reg" obviously
not done the work when keeping the debug information. Then we know we need
to update it and re-test

*2.1.2    **Update the LLVM testing system*

The LLVM testing infrastructure contains two major categories of tests: code
fragments and whole programs. Code fragments are referred to as the "DejaGNU
tests" and are in the llvm module in subversion under the llvm/test
directory. The whole programs tests are referred to as the "Test suite" and
are in the test-suite module in subversion.

Scanning all the test cases, find those using the specified transform and
add the script similar to that previously mentioned.

Make the result write into llvm-test TEST/report.

*2.2  * *Phase 2: New Pass to Strip Debug Information*

LLVM already has a transform pass "-strip-debug", it removes all the debug
information. But for the first half of this project, we want to just keep
the line number information (stop point) in the optimized code. So we need a
new transform pass to just removes the variable declaration information.
Pass "-strip-debug" also doesn't cleanup the dead variable and function
calling for debug information, it thinks other pass like "-dce" or
"-globaldce" can handle this. But as we are also going to update those
passes, we can't use them in the verification flow, otherwise, it may output
incorrect check results.

The new pass "-strip-debug-pro" should have the following functions:

1.         Just remove the variable declaration information and clean up the
dead debug information.

2.         Remove all the debug information and clean up

*3.2.1    **Work Plan*

1.         Take a reference to transform pass StripSymbol.cpp

2.         Based on the StripSymbol.cpp, add an option to it to just remove
debug information, like "-rm-debug"

3.         Add an option to just remove the variable declaration
information, like "–rm-debug=2"

4.         Add a procedure to clean up the dead variables and function calls
for debug purpose.

*2.3  ** Phase 3: Extend llvm-gcc*

Once we have a way to verify what is happening, I propose that we aim for an
intermediate point: instead of having -O disable all debug info, we should
make it disable just variable information, but keep emitting line number
info.  This would allow stepping through the program, getting stack traces,
use performance tools like shark, etc.

We need the front-end llvm-gcc to have a mode that causes it to emit line
number info but not

variable info, we can go through the process above to identify passes that
change behavior when line number intrinsics are in the code.

*1.3.1    **Work Plan*

1.         First locate the file position that llvm-gcc handle the parameter
options.

2.         Add a new option to control the llvm-gcc to emit specified debug
information: like –g1. –g1 to only emit line number.

3.         Building the new llvm-gcc

4.         Testing through llvm/test, llvm-test

*2.4  ** Phase 4: Update Transform Passes for Line Number Info.*

When the front-end has a mode that causes it to emit line number info but
not variable info, we can go through the process above to identify passes
that change behavior when line number intrinsics are in the code.  Obvious
cases are things like loop unroll and inlining: they 'measure' the size of
some code to determine whether to unroll it or not. This means that it
should be enhanced to ignore debug intrinsics for the sake of code size
estimation.

Another example is optimizations like SimplifyCFG when it merges
if/then/else into select instructions. SimplifyCFG will have to be enhanced
to ignore debug intrinsics when doing its safety/profitability analysis, but
then it will also have to be updated to just delete the line number
intrinsics when it does the xform. This is simplifycfg's way of "updating"
the debug info for this example transformation.

As we progress through various optimizations, we will find cases where it is
possible to update (e.g. loop unroll or inlining, which doesn't have to do
anything special to update line #'s) and places where it isn't.  As long as
the debug intrinsics don't affect codegen, we are happy, even if the debug
intrinsics are deleted in cases where it would be possible to update them
(this becomes a optimized debugging QoI issue).

*3.4.1** Work Plan*

1.         Update transform pass mem2reg

2.         Testing through llvm/test, llvm-test

3.         Update transform pass simplifycfg

4.         Testing through llvm/test, llvm-test

5.         Likewise, update transform passes globalopt, globaldce,
ipconstprop, deadargelim, instcombine...

6.         Update other passes and testing them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090202/cbcd8e52/attachment.html>