[cfe-dev] Matching Clang's AST nodes to the LLVM IR instructions they produced.

John McCall rjmccall at apple.com
Mon Jan 16 19:06:14 PST 2012


On Jan 16, 2012, at 6:30 PM, Matthew Heinsen Egan wrote:
> I need to be able to determine, from an LLVM IR instruction, the
> specific node in the AST that produced that instruction. For example,
> if I had a C program containing the following:
> 
>  // int a, b, c
>  (a * b) + c
> 
> Which produced the following:
> 
>  %0 = load i32* %a.addr, align 4
>  %1 = load i32* %b.addr, align 4
>  %mul = mul nsw i32 %0, %1
>  %2 = load i32* %c.addr, align 4
>  %add = add nsw i32 %mul, %2
> 
> I would like to be able to determine that %mul represented the value
> of (a * b). The IR will be unoptimized, and I'm only interested in C
> language programs at this stage.
> 
> What is the best way to approach this? I was thinking of modifying
> codegen to add some metadata that mapped back to the AST (similar to
> EmitDeclMetadata), but I imagine that having to maintain the
> modifications would cause difficulty with keeping up-to-date with
> Clang. Ideally I would like to be able to use some sort of plugin, or
> extend codegen from outside of Clang's source tree, so that I can
> avoid these issues and use a pure version of Clang. Any pointers on
> how to achieve this would be greatly appreciated.

There is no way to do this without modifying the Clang source.

The main problem is going to be deciding what the current expression
is, but I think that's quite achievable.  IR-gen uses recursive descent;
all you really need to do is maintain a stack of what expressions are
currently being emitted.  That's relatively easily done with RAII
objects in all the major "dispatch" functions that switch over all
possible expression kinds.  That should be a pretty modest number
of modifications:
  - IRGenFunction::EmitLValue
  - ScalarExprEmitter::Visit
  - ComplexExprEmitter::Visit
  - AggExprEmitter::Visit
  - probably somewhere in CGStmt.cpp
  - maybe a few other random places like the short-circuit evaluator

Once you've got that, it's relatively easy to slap custom metadata
on every new instruction as it's inserted;  just change the
CGBuilderTy typedef to make IRBuilder use a custom inserter class.

John.



More information about the cfe-dev mailing list