[LLVMdev] Optimization passes and debug info

Wed Jul 23 10:50:39 PDT 2008

On Jul 23, 2008, at 8:08 AM, Matthijs Kooijman wrote:

> Hi Chris,
>
>> I just meant -O3 as an example.  I'd expect all -O levels to have the
>> same behavior.  -O3 may run passes which are more "lossy" than -O1
>> does though, and I'd expect us to put the most effort into making
>> passes run at -O1 update debug info.
> I'm not really sure that you could divide passes into "lossy" and  
> "not so
> lossy" that easily.
>
> For example, SimplifyCFG will be run at every -O level.  This would  
> imply that
> it must be a "not so lossy" pass, since we don't want to completely  
> thrash
> debugging info at -O1.

Totally agreed,

> However, being "not so lossy" would probably mean that SimplifyCFG  
> will have
> to skip a few simplifications. So, you will have to choose between  
> two goals:
> 	1) -g should not affect the outputted code
> 	2) Transformations should preserve as much debug info as possible
>
> I can't think of any way to properly combine these goals. It seems  
> that goal
> 1) is more import to you, so giving 1) more priority than 2) could  
> work out,
> at least for the llvm-gcc code which defines optimization levels in  
> this way.

I don't see how choosing between the two goals is necessary, you can  
have both.  Take a concrete example, turning:

  if (c) {
     x = a;
   } else {
     x = b;
   }

into:

   x = c ? a : b

This is a case where our debug info won't be able to represent the  
xform correctly: if the select instruction is later expanded back out  
to a diamond in the code generator, we lost the line # info for the  
two assignments to x and the user won't be able to step into it.  If  
the code generator doesn't expand it, you still have the same  
experience and there is no way to represent (in machine code) the  
original behavior.

That said, it doesn't really matter.  This is an example where  
simplifycfg can just discard the line # info and we accept the loss of  
debug info.  Even when run at -O1, I consider this to be acceptable.   
My point is that presence of debug info should not affect what xforms  
get done, and that (as a Quality of Implementation issue) xforms  
should ideally update as much debug info as they can.  If they can't  
(or it is too much work to) update the debug info, they can just  
discard it.

> However, I can imagine that having a -preserve-debugging flag, in  
> addition to
> the -O levels, would be very much welcome for developers (which  
> would then
> make goal 2) more important than 1)). Perhaps not so much as an  
> option to
> llvm-gcc, but even more so when using llvm as a library to create a  
> custom
> compiler.

Why?  Is this is an "optimize as hard as you can without breaking  
debug info" flag?  Who would use it (what use case)?

> Do you agree that goal 2) should be possible (even on the long  
> term), or do
> you think that llvm should never need it? In the latter case, I'll  
> stop
> discussing this, because for our project we don't really need it  
> (though I
> would very much like it myself, as an individual developer).

I won't block such progress from being implemented, but I can't  
imagine llvm-gcc using it.  I can see how it would make sense in the  
context of a JVM, when debugging hooks are enabled.   Assuming that  
running at -O0 is not acceptable, this is a potential use case.

>> These three levels are actually a completely different approach, on  
>> an
>> orthogonal axis (reducing the size of debug info).
> I'm not really sure what you mean with this. The idea behind the  
> levels is to
> find the balance in the optimization vs debug info completeness  
> tradeoff.

There is no balance here, the two options are:

1) debug info never changes generated code.
2) optimization never breaks debug info.

The two are contradictory (unless all optimizations can perfectly  
update debug info, which they can't), so it is hard to balance  
them :).  My perspective follows from use cases I imagine for C family  
of languages: I'll admit that other languages may certainly want #2.   
Can you talk about why you want this?

> I totally agree with keeping debug info consistent in all cases.  
> Problems
> occur when an optimization can't keep the debug info full  
> consistent: It must
> then either remove debug info or refrain from performing the  
> optimization.

In my proposal, the answer is to just remove the debug info, as above  
with the simplifycfg case.

>> I think that codegen should be controlled with -O (and friends) and  
>> that
>> -g[123] should affect the size of debug info (e.g. whether macros are
>> included, etc).  If the default "-g" option corresponded to "-g2",   
>> then I
>> think it would make sense for "-g1" to never emit location  lists for
>> example, just to shrink debug info.
> I think that having the multiple -g options you describe is yet  
> another axis,
> that is related to which debug info is generated in the first place.

Fair enough.

>> Does this seem reasonable?
> I think we're at least getting closer to making our points of view  
> clear :-)

:)

-Chris