[LLVMdev] Google Summer of Code proposal: Adding memory safety checks to the LLVM bitcodes

Mon Apr 2 11:40:13 PDT 2012

Dear John,

    in the last e-mail we'd exchanged I forgot to CC the llvmdev list,
so I am copying the e-mail back to the list.

> One thing I should point out is that SAFECode and SoftBound (ideally) should be using a static array bounds checking algorithm that is sound with respect to integer overflow.  For example, x < x+5 is always true if x is an integer, but it's not always true if x is a 32-bit two's complement number; additional constraints on x are needed to prove that x < x+5 is true.
>
> Does your analysis take two's complement overflow/underflow into account?

No. The less-than lattice only reports if a variable is less than
another, assuming that your lattice is {-inf, ... -2, -1, 0, 1, 2, ...
+inf}. But I know of another lattice that takes overflows into
consideration. It is called the Pentagon domain, and it was published
by Logozzo and Fahndrich in the paper [1]:

[1] Francesco Logozzo, Manuel Fähndrich: Pentagons: A weakly
relational abstract domain for the efficient validation of array
accesses. Sci. Comput. Program. 75(9): 796-807 (2010)

Pentagons combine the less-than lattice with the interval lattice to
have an analysis that is sound in terms of wrapping arithmetics. We
have an implementation of range analysis here at UFMG. It also uses
the lattice {-inf, ... -1, 0, 1, ... +inf}, but it is safe to use
limits that are bound by values different than infinity, e.g., [10,
100], for instance. On the other hand, [10, +inf] could wrap around,
thus compromising the lower bound, and it is not safe to use it.

>
> Just to make sure I understand: you're using abstract interpretation to map SSA values to elements in some lattice.  In your standard value range analysis, you're assigning concrete values ranges (e.g., (4,10)) to each SSA value.  In the "less-than" domain, you're mapping each LLVM SSA value to a set of SSA values which are greater than or equal to the SSA value (e.g., x -> [a, y, z] means that x <=a and x <= y and x <=z).  Is this correct?

Yes!

>
> Interesting.  One problem with on-demand analysis (at least the way it's done in LLVM for things like alias analysis) is that it gets tricky for inter-procedural analyses.
>
> With the less-than domain, I suppose what you could do is have the "user" of the analysis first specify which variable relationships they want to query.  With that information, your 'less-than' analysis runs as usual with the exception that it ignores variables not queried by the "user", thus making the sets of variables smaller.  The run-time is still cubic, but the input size is smaller.

Yes: that is the idea of doing it on demand. And the analysis must be
context-sensitive too. Venet and Brat had discussed it before in the
paper [2]. One way to get a bit of context sensitiveness is with
function inlining, that LLVM already does for us.

[2] Precise and Efficient Static Array Bound Checking for Large
Embedded C Programs, Arnaud Venet and Guillaume Brat.

Thus, to summary my GSoC project, I would implement an array bounds
check elimination engine, after the paper [1], using the Pentagon
domain, which combines the less-than lattice with the interval
lattice. The analysis would be context sensitive. I think to do all of
it in three months is too much, but I could implement the pentagon
domain, and then use function inlining to get some sort of context
sensitiveness. I would pursue this project further, as my Master
dissertation. Then I would try to implement true context
sensitiveness.

What do you think? Could I give it a try and rewrite my initial proposal?

By the way, I've been following your discussion regarding Victor's
proposal. We worked together in the implementation of the range
analysis engine.

regards,

Raphael.