[llvm-dev] Pool allocator + safecode

Thu Oct 8 14:14:59 PDT 2015

On 10/8/15 5:37 AM, Ed Robbins wrote:
> Thanks for the fast response John.
>
> On Thu, Oct 1, 2015, at 04:51 PM, John Criswell wrote:
>> Dear Ed,
>>
>> First, someone has updated the DSA code in the poolalloc project to LLVM
>> 3.7, and a Master's student worked for me over the summer to update a
>> large chunk of SAFECode to LLVM 3.7.  However, the update to LLVM 3.7
>> isn't finished (we need to finish integrating SAFECode back into Clang),
>> and my student has opted to focus on his studies instead of finishing
>> the update, so the current status of SAFECode for LLVM 3.7 is that it's
>> nearly done but suspended.
>>
>> If you want the currently updated code, you can get it at
>> https://github.com/jtcriswell/safecode-llvm37.
> It's great that someone has begun work on this! I will look at it now.
>
> Can I ask what happened to the llvm-gcc (or dragonegg) front end? I
> haven't looked into how it works too much yet (I'm just starting out
> with LLVM/clang etc), but if integration with the clang front end works
> do you get dragonegg integration for free? From the web page I'm not
> sure if dragonegg is also bit-rotting, but a gcc front end would be
> useful for me.

The original llvm-gcc (in which we modified gcc to produce LLVM bitcode) 
no longer exists.  When GCC was extended to use plugins, someone created 
the DragonEgg plugin.  I am not sure if that still exists.

Is there a reason you need a GCC-based front-end?  Clang is intended to 
be a drop-in replacement for GCC and supports many of the GCC extensions.

FWIW, integrating SAFECode into Clang does not automatically give you 
DragonEgg support.  Integrating into Clang simply makes it easier to run 
the SAFECode transformation passes after the LLVM passes have been run.

>> Second, if you need the pool allocation transform, then I must sadly
>> disappoint.  The Automatic Pool Allocation (APA) transformation has
>> suffered bit-rot over the years, and we've stopped using it.  We haven't
>> updated it to work with LLVM 3.7 and currently have no plans to do so.
>>
>> The code is still publicly available, so if you are sufficiently
>> motivated, you can update it and fix it if you would like.
> I do need the APA transform. I'm a bit confused as to how the DSA code
> is useful without APA? I'll cross my fingers and hope that the API
> changes to 3.7 aren't too bad. Am I right in thinking that llvm-3.2 is
> the last time this worked? Does the above github repo include the latest
> APA stuff (albeit disabled)?

DSA is a points-to and call graph analysis.  As such, it is used for 
many things in addition to APA and SAFECode.  For example, the SMACK 
verifier uses DSA.

The APA code has been bit rotting for awhile.  While you can compile and 
run it with LLVM 3.2, it won't work as well as it did for the paper, and 
it probably needs some work to make it robust.  Please remember that APA 
has always been a research prototype.

>   
>> Just out of curiosity, what are you trying to accomplish? Depending on
>> what you need, there may be simpler approaches that will require less
>> engineering effort.
>>
> We really want the all singing all dancing safecode framework with APA
> as detailed in the 2005 TECS SafeCode paper (Memory Safety Without
> Garbage Collection for Embedded Applications). We are trying to build a
> C based embedded system that is type safe at the lowest possible run
> time cost. So I am also going to modify the uninitialized pointer MMU
> based stuff to work with the ARM Cortex M3 MPU. I don't think there are
> any shortcuts here (I'd be happy to be proved wrong though) - we need
> APA.

I see.

If you just need to isolate processes on your system so that they don't 
overwrite each other, it looks like the ARM Cortex M3 MPU can do that.  
Is there a reason why that won't work?

However, if you want to isolate processes using inferred type safety, 
you will need to address three significant challenges:

1) Due to changes in LLVM, DSA has a difficult time inferring types.  In 
a nutshell, some LLVM optimizations turned typed-getelementptr (GEP) 
instructions into GEPs that use byte-level indexing.  DSA was written 
with the assumption that GEPs carried high-level type information.

I think this is fixable by refactoring DSA's type inference code to act 
more like Value Set Analysis: it would use a map between a 4-tuple and a 
type.  The 4-tuple (a, b, c, d) describes a formula ax + b in which a 
and b denote offset and stride and c and d place a limit on the lower 
and upper bounds of x.  c and d can be +- infinity, allowing the 4-tuple 
to denote a type that occurs in an unbounded array.

As an FYI, there are multiple research groups interested in accurate 
points-to analysis results (mine included).  I'll be holding a BoF at 
the LLVM Developer's meeting this year to discuss who needs what and if 
there's a way to develop it.

2) Making APA robust.  In my personal opinion, the original APA may be 
more complicated than necessary for your application.  APA currently 
creates context sensitive pools, which means that it needs to pass pool 
handles around.  This requires transforming function signatures which 
makes inter-operating with external code a real pain.  It also makes the 
transform complicated.

A simpler design would be to create one pool for each type that DSA 
infers.  The points-to analysis would not be sound (unless it unified 
all points-to sets that have the same type), but it would allow all 
pools (or nearly all pools) to be global variables, greatly simplifying 
the transformation.  It would also continue to prevent the application 
from accessing data outside its allocated memory.

3) The version of SAFECode in that paper used the Omega constraint 
solver to prove that array accesses are safe.  That code has long 
bit-rotted away, and its implementation was not the most efficient (it 
exec()'ed the Omega solver for every query).  A better approach today 
would be to integrate the constraint solver into the compiler proper.  
Additionally, you now have other tools available, such as CVC4, Z3, and 
SMACK/Boogie, for building and solving the constraints.

As an FYI, later versions of SAFECode use run-time checks for 
type-unsafe pools and array accesses so that it can support the full 
generality of C (see Dhurjati's PLDI 2007 paper and my SVA 
publications).  You could probably do something simple in which 
type-safe data goes into pools and type-unsafe data goes into a large 
area for which loads and stores are subjected to simple SFI-like 
instrumentation (e.g., Google Native Client).  There may be solutions in 
the middle of the spectrum between fast-but-difficult-to-implement and 
slow-but-easy-to-implement.

Regards,

John Criswell

> Thanks again,
> Ed
>
>> Regards,
>>
>> John Criswell
>>
>>
>> On 10/1/15 10:51 AM, Ed Robbins via llvm-dev wrote:
>>> Hi,
>>> I'm trying to get the pool allocator and safe code building against llvm
>>> trunk. I've run into a build error, and I see that in the past another
>>> user was told just not to build the pool allocator for use with safecode
>>> [1]. However, I really want the pool allocator transforms, so I just
>>> wanted to check why the suggestion was not to use it. Has it been
>>> superseded in some way by something else? Or is it just broken at the
>>> moment? I'll assume the latter for now and see if I can fix it...
>>>
>>> Cheers
>>>
>>> [1]http://lists.llvm.org/pipermail/llvm-dev/2015-July/088739.html
>>>
>> -- 
>> John Criswell
>> Assistant Professor
>> Department of Computer Science, University of Rochester
>> http://www.cs.rochester.edu/u/criswell
>>

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell