[LLVMdev] SSI in LLVM

Fri May 15 10:27:33 PDT 2009

Dear LLVM Community,

   I am one of the summer of coders working on LLVM this year. My 
project is to implement the ABCD algorithm for array bounds checking, 
and also a bitwidth analysis that maps variables to an approximation of 
its size in bits. To implement this, I will have to simulate a 
intermediate representation called SSI (Static Single Information) form 
on top of LLVM SSA representation. The SSI form has the property that 
each use of a variable post-dominates its definition. Also, if there are 
two uses of the same variable, say, u(v1) and u(v2), then, either u(v1) 
dominates u(v2) or vice-versa.

I would like to discuss my approach with you guys, so that you can 
redirect me if I am going through a bad path, so I am listing some 
points below:

1) I want to implement a very modular design. In this way, I will have 
an analysis that receives the intervals produced by 
LiveIntervalAnalysis, plus a list of variables, and creates a new set of 
intervals, so that each interval designates a new virtual variable, that 
is visible only inside my analysis. These variables have the SSI 
property. In this way, it is possible to map constraints such as (a > 
10) to an interval.

2) Each client gives to my analysis the variables that it wants mapped 
to SSI. For instance, the ABCD client passes all the variables that are 
used as indexes of arrays, or as array sizes. A pass to eliminate dead 
code passes all the variables used inside conditionals, and the pass 
that does bitwidth analysis passes all the variables of type int, for 
instance.

3) This implies that each client will have to traverse the program 
representation harvesting the variables of interest. My analysis will 
take care of simulating the SSI representation for those variables.

4) Queries can be made given an instruction index, as computed by 
LiveIntervalAnalysis. For instance, a typical query would be: is a > x 
at program point 110.

5) Keeping the intervals ordered, we can answer queries in O(ln N), 
where N is the maximal program index.

I would like to have critics on this approach so it can be well thought
before implementation to reduce reimplementation. In particular, to use 
this technique, my analysis must work at the MachineFunction level, as 
it must run after LiveIntervalAnalysis. Do I miss essential information 
at this level, compared to the Function level? I mean, is it possible to 
analysis conditionals to discover things like a > 10, or a == 10, etc?

Please, feel free to ask me any clarification you may think about. I 
would really appreciate any comments and thoughts you guys may have.

Thanks,

-- 
Andre Tavares
Master Student in Computer Science - UFMG - Brasil
http://dcc.ufmg.br/~andrelct