[llvm-commits] Updated SAFECode Patch

Thu Jul 21 00:38:25 PDT 2011

On Jul 19, 2011, at 3:36 PM, John Criswell wrote:
> Dear All,
> 
> Attached is an updated version of the SAFECode patch.  This patch:
> 
> 1) Is updated to the new LLVM type API; and
> 2) Can just be applied using patch -p0 < scpatch instead of having to untar an archive of changes.
> 3) Perhaps I forgot it in previous posts, but using patch -p0 < testpatch  in the test-suite directory will provide some test Makefiles which can be used by using "make TEST=sc report"
> 4) It includes a RewriteOOB pass which implements a feature called Out of Bounds Pointer rewriting.  Basically, it allows SAFECode to permit out of bound pointers provided that such pointers are not dereferenced.  It's an optional feature.

Hi John,

Thanks for updating this and I apologize again 10x for the delay reviewing this.  This is a *lot* of code to wade through.  Overall, the code is well structured and well commented.

Here are a few micro-level comments:

* Passes like InsertGEPChecks should be in a .cpp file that matches the pass name, and don't need the class to be exposed in a public header file (for the same reason the SCCP class (for example) is private to its own .cpp file).

* There is some '#if 0' code that should be removed.

* Did you guys actually write the asm blob in strncpy_asm?  Are you sure the code is not GPL?  It looks dead anyway, dead code should be zapped.

* Found a bug! :)

+  Constant * GAVConst = M->getOrInsertFunction ("pchk_getActualValue",
+                                                VoidPtrTy,
+                                                VoidPtrTy,
+                                                VoidPtrTy,
+                                                NULL);
+  Function * GetActualValue = dyn_cast<Function>(GAVConst);
...
+ use(GetActualValue)

The dyn_cast should be a cast<> or check for null.

* The code isn't fully detangled from the other projects in the research group, there are still references to llva, declarations but no definition of things like pool_rindex, etc.

* There are lots of minor coding style issues, such as the extra spaces in the code, some 80 col violations etc:

+static inline Value *
+peelCasts (Value * PointerOperand, std::set<Value *> & Chain) {
+  Value * SourcePointer = PointerOperand;

* Inefficient containers like std::set and std::map are used pervasively, as are verboten classes like std::ostream.

* There are linux-system specific details in the code, I suspect it's not portable to other systems.  I also see various uses of unistd.h, errno.h etc.

* Why do you have an implementation of a SplayTree?  Why is it better than a hash table or other data structure?

* The files have the old LLVM copyright header on them.

* I see that the runtime library contains a bunch of pool allocation related stuff, is that used by the transformation patches or used by other parts of safecode that can't be included because they depend on DSA?

* Where is the documentation on how to use this? :-)  What are the features and limitations of this work in the absence of DSA and the other pieces?  You've made claims that this is generally useful for partial applications and other code, is this really true how does it work?

Anyway, those are some thoughts on the first pass through the code.  As it is, the code isn't ready to go in, and even if it were, it would be impossible to really review as a monolith.  Beyond that, I have my concerns about who is going to maintain and push forward this code going forward.  I'm not accusing you guys of it, but I have a general concerned about "abandonware" in the codebase, and do routinely purge apparently dead passes from the codebase.  Do you have particular clients that are actively using this and will defend against me not purging it in a future release?

If you're driven to get this better integrated into LLVM and available to a wider community, there are two routes you can take:

1. The work can be decomposed into small and incremental pieces that each deliver specific value, and you can propose and integrate these one piece at a time.  For example, start with one of the passes and a minimal runtime library that will catch some subset of the bugs.  Once that is in, rinse and repeat for the next piece and the next piece etc.  This is a really long road for this much code, but particularly if it is delivering concrete value and there is a client that will be using it, it is the right way to get it into the project.

2. You can rework the high level hooks of this to use the new PassManagerBuilder.h interface to "hook into" extension points in the standard pass manager, rework this code to be a clang plugin, and keep the code available in the safecode.llvm.org repo.  Doing this engineering work (to make it trivial to turn on with a stock clang) is really required for #1 anyway.  This will make it much easier for people to try safecode out, and we're pushing to split other parts of the codebase out into plugins as well (e.g. the clang static analyzer and the ARC migration tool), so hopefully that plugin infrastructure will continue to develop.

I guess option #3 is to maintain the status quo and keep safecode in its own repo right now.  Depending on what your goals are, this is actually a very attractive option for the same reasons that LLDB being in its own repo is: you can choose when and how to synch up to llvm (lldb just uses a "current recommend revision # of llvm), being a subproject allows you to move more quickly and make changes when it makes sense to you.

Anyway, I guess my message is that the route forward depends on what your goals are.  If your goal is to get it into mainline LLVM to get more users, then I think that things are backwards: we really want there to be end users driving things into the compiler, not the other way around.  If you're looking to reduce the cost of merging mainline API changes, there are *many* opportunities to adopt the C API in these passes, which would greatly reduce the amount of churn required to keep stuff working.  Even though there isn't C API for everything, there is for the most volatile stuff.

I hope this helps,

-Chris