[llvm-dev] LLVM: mapping unoptimized IR back to clang AST

Joshua Cranmer 🐧 via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 16 06:48:19 PDT 2015


On 9/15/2015 4:10 AM, Tomas Kalibera via llvm-dev wrote:
>
> Hi,
>
> I would like to rewrite a C program based on analyzing the LLVM IR of 
> that C program, produced by CLANG. Did anyone have any hints on how to 
> map the IR back to CLANG AST?
>
> Can I do better than invoking clang with "-g -O0" to produce an IR for 
> this task? (can I get more debug info, disable more optimizations?)
>
> The debug info in LLVM IR does not seem to have information on C 
> macros (while CLANG AST does) - is there a way to get that information 
> from the IR?

Going from C to IR is inherently a lossy transformation. Macros don't 
exist except at a pre-lexing stage (although clang does retain them 
through semantic analysis). There are several ASTs that would map to the 
same IR (for example, for (;;) {}, while (1) {}, and do {} while (1); 
are all the exact some control flow, yet have different IR). While it 
might be the case that you could do some disambiguation based on things 
like basic block names, such introspection would be highly brittle and 
likely to break if any optimization pass is run. Note that if you don't 
run some basic optimization passes (such as mem2reg), the IR is going to 
be much harder to analyze (e.g., you would need def-use tracking through 
memory to do basic constant propagation of local variables!).

Or, put another way, if you try to do C -> IR -> C, you will have to 
accept that the output C may look nothing like the input C.

> Is it possible to add some custom meta-data to CLANG AST nodes that 
> would somehow propagate through CLANG to the LLVM IR?
> I could think of wrapping some AST nodes into dummy function calls, 
> but that seems rather crude.
>
> Indeed, some analysis can also be done at AST level, but it seems to 
> me that it is easier to do at the IR level. Also the IR level has 
> linking information, one can do inter-procedural analyses.

If you're trying to rewrite C code, you'll want to do that all at AST 
level. It is too lossy to convert to IR and back again.

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist



More information about the llvm-dev mailing list