[LLVMdev] Marking source locations without interfering with optimization?

Mon Aug 22 22:41:48 PDT 2005

On Fri, 19 Aug 2005, Michael McCracken wrote:

> I've been thinking of adding an instruction, and I'm following the
> advice in the docs to consult the list before doing something rash.

Always a good idea! :)  Instead of adding an instruction, I'd suggest 
adding an intrinsic.  You can mark intrinsics as not reading/writing to 
memory (see lib/Analysis/BasicAliasAnalysis.cpp for example, look for 
llvm.isunordered to see how it is handled).

> What I want to do is provide a way to identify variable names and
> source locations that doesn't affect the effectiveness of
> optimizations. This is not the same problem as supporting debug info,
> because I don't care about being able to look up unique names for
> memory locations or evaluating expressions, etc... I just want to be
> able to say during an optimization pass what the best guess for the
> source location and variable names are for a value or instruction that
> the pass is doing something interesting to.

Okay...  this is tricky.  Anything that will bind to variables will 
prevent modification to the variable.  I would suggest something like 
this (C syntax for the llvm code):

int foo() {
   %A = alloca int
   llvm.myintrinsic("A", whatever data you want")
}

> Because I don't need to support the functionality of a debugger with
> this, it is OK if that best guess contains more than one possibility,
> as long as it isn't a huge number of possibilities. The idea is that
> I'm producing information for a programmer who needs to know what is
> going on during optimization, so I want to give them as much detail as
> possible, it's OK if it isn't exact, but it is not OK if it interferes
> with the optimization, because that's the whole point.

Given the above, you can use the constant string "A", to look up things in 
the symbol table of the function.  You will probably want to accept "A" 
and anything that starts with "A.".

> So, given those goals, it seems that just using the traditional debug
> info as it is designed is not a good idea, since I want more and
> fuzzier answers.

Makes sense.

> Also, unless I'm missing something, the debug info uses intrinsic
> function calls, which are treated as un-analyzable, and if I tried
> supplying those with actual values to link the values to the source,
> then some important analyses will fail. Is that right or am I
> misunderstanding the docs on intrinsics?

Correct.

> So, I thought one way to go would be to introduce an instruction meant
> just for marking the source location of a value - it'd consume a value
> and some constants marking the location - then the front end could
> generate it (not by default!) where necessary to make sure a value
> could be traced back to its source location. It'd either be lowered
> away or it'd have to be ignored during codegen since we might still
> want to know that info then, for instance, to track register spills
> back to which variable spilled.

I think the above will work for you, you can make it ignored or deal with 
it however you want using the intrinsic lowering code.  Check out how 
other intrinsics are handled (e.g. llvm.isunordered, which is handled by 
the code generators and llvm.dbg.* which are not) for ideas.

> What problems can you think of with that approach? Am I asking for
> trouble with passes, or would a semantically meaningless 'marker'
> instruction be OK?

I'd seriously suggest using an intrinsic instead of an instruction: they 
are far far easier to add.  Aside from that, using the symbol table is 
really the only thing that will work, and is prone to obvious problems, 
but should work pretty well in practice.

> If you have suggestions for a better way to do this, that'd be great.
> There isn't a lot of prior work I found on this, most of what I saw
> was about debug info, which as I stated, is not quite what I need.

Hope this helps!

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/