[llvm-commits] [PATCH] Refactoring MemoryBuiltin Analysis

Nick Lewycky nlewycky at google.com
Mon Jun 18 16:57:49 PDT 2012


Thanks for taking this on, as you've noticed LLVM needs cleanup here. I
didn't make it all the way through your patch, but I have many comments on
it. I think your patch will need some substantial work before it can go in,
but I'm happy with the changes to the analysis/transform passes!

+#include "llvm/ADT/DenseMap.h"
+#include "llvm/Support/DataTypes.h"
+#include "llvm/Support/IRBuilder.h"
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/Support/TargetFolder.h"

Please sort.

+/// isAllocationFn - returns true if V is a call to an allocation function
+/// (either malloc, calloc, realloc, or strdup like).
+bool isAllocationFn(const Value *V, bool LookThroughBitCast = false);

Don't reuse "allocation function" in its definition, instead try something
like "V is a call to a function that allocates memory".

+/// isAllocLikeFn - returns true if V is a call to a malloc, calloc, or
strdup
+/// like function.
+bool isAllocLikeFn(const Value *V, bool LookThroughBitCast = false);

You've got to include some more detail in the doxystrings here to
differentiate between isAllocationFn, isAllocLikeFn and isMallocLikeFn!

+/// isCallocLikeFn - returns true if V is a call to a calloc-like function.
+bool isCallocLikeFn(const Value *V, bool LookThroughBitCast = false);

Similarly; calloc-like means that the returned memory is zero-initialized?

+// FIXME: can use unions here to save space
+struct CacheData {
+  APInt Offset;
+  Value *OffsetValue;
+  APInt Size;
+  Value *SizeValue;
+  bool ReturnVal;
+  CacheData() {}
+  CacheData(APInt Off, Value *OffVal, APInt Sz, Value *SzVal, bool Ret) :
+    Offset(Off), OffsetValue(OffVal), Size(Sz), SizeValue(SzVal),
+    ReturnVal(Ret) {}
+};

What are these? Each member should have a comment. Are "APInt <X>" and
"Value *<X>Value" mutually exclusive? If so, why have a constructor that
specifies all of them? Also, what does it really mean to construct the
empty one, besides the fact that you can insert it into a DenseMap?

Also, put the ":" of the initialize list on the same line as the first
variable being initialized.

+struct ObjectSizeAnalysis {
+  /// initialize analysis. If ConstOnly is false, then the APIs of this
analysis

Use "class" since it does have private members. Capitalize "Initialize" as
the start of the sentence.

I think this should really be two pieces, one which determines the size of
things which can be done without modifying the code and a wrapper around
that which builds up IR when that fails. Here's roughly what I have in mind:

  typedef SizeOffsetType std::pair<Value *, Value *>;
  class ObjectSizeOffsetVisitor : public
InstVisitor<ObjectSizeOffsetVisitor, SizeOffsetType> {
  public:
    // ...
    SizeOffsetType visitSelectInst(SelectInst &SI) {
      SizeOffset TrueSide = get(SI.getTrueValue());
      SizeOffset FalseSide = get(SI.getFalseValue());
      if (TrueSide == FalseSide) {
        SizeOffset = make_pair(TrueSide, SizeOffset.second);
      }
    }
    // ...
  private:
    SizeOffsetType get(Value *) { /*check cache, call this->visit(Value *)
on miss*/ }
    // ...
    SizeOffsetType SizeOffset;  // initialize to (NULL, NULL)
 };

class ObjectSizeEvaluator : public InstVisitor<ObjectSizeEvaluator,
std::pair<Value *, Value *> > {
  // ...
  SizeOffsetType visitSelectInst(SelectInst &SI) {
    SizeOffsetType SizeOffset = V.visitSelectInst(SI);
    if (!SizeOffset.first || !SizeOffset.second) {
      SizeOffsetType TrueSide = this->get(SI.getTrueValue());
      SizeOffsetType LeftSide = this->get(SI.getTrueValue());
      if (!SizeOffset.first)
        Builder.CreateSelect(SI->getCondition(), TrueSide.first,
FalseSide.first);
      if (!SizeOffset.second)
        Builder.CreateSelect(SI->getCondition(), TrueSide.second,
FalseSide.second);
    }
  }
  // ...
  ObjectSizeOffsetVisitor V;
  IRBuilder Builder;
}

Does that make sense? You'll need to implement your own visit() method that
handles Arguments and GEP constantexpr's and such, but instructions should
go through the existing InstVisitor interface.

+  ObjectSizeAnalysis(const TargetData *_TD,
+                     LLVMContext &_Context,
+                     bool _ConstOnly,
+                     bool _RoundToAlign = false) :

Please don't use leading underscores! That's reserved. I personally like to
just use the exact same names as the members (yes, the scoping rules work
out such that "TD(TD)" will initialize TD the member from TD the
constructor argument) but if you don't like that place the underscores on
the end.

Index: lib/Analysis/IPA/GlobalsModRef.cpp
===================================================================
--- lib/Analysis/IPA/GlobalsModRef.cpp (revision 158666)
+++ lib/Analysis/IPA/GlobalsModRef.cpp (working copy)
@@ -329,15 +329,8 @@
       // Check the value being stored.
       Value *Ptr = GetUnderlyingObject(SI->getOperand(0));

-      if (isMalloc(Ptr)) {
-        // Okay, easy case.
-      } else if (CallInst *CI = dyn_cast<CallInst>(Ptr)) {
-        Function *F = CI->getCalledFunction();
-        if (!F || !F->isDeclaration()) return false;     // Too hard to
analyze.
-        if (F->getName() != "calloc") return false;   // Not calloc.
-      } else {
+      if (!isNoAliasFn(Ptr))
         return false;  // Too hard to analyze.
-      }

I'm not sure this is right.. There's another assumption that globals
mod-ref is making which isn't in the comments; it's assuming that the
called functions (malloc and calloc) have no side-effects other than the
allocation of memory. In general, the callee needs to mark
"FunctionInfo[Writers[i]].GlobalInfo[I] |= Mod" when isNoAlias but can
remove that marking when it really is malloc/calloc.

+enum AllocType {
+  MallocLike         = 1<<0, // allocates
+  CallocLike         = 1<<1, // allocates + bzero
+  ReallocLike        = 1<<2, // reallocates
+  StrDupLike         = 1<<3,
+  AllocLike          = MallocLike | CallocLike | StrDupLike,
+  AnyAlloc           = MallocLike | CallocLike | ReallocLike | StrDupLike
+};

I don't understand why StrDup gets special handling? "fopen" also returns
newly allocated memory. What they have in common is the noalias marker. Are
you using this like calloc to indicate that the memory is initialized to a
copy of other memory? Please add comments.

+/// isAllocationCall - returns the index of the AllocationFnData array if
V is
+/// a call to a known allocation function, and -1 otherwise.

This is a very strange function. Why not return a pointer to the array
element or NULL if that's the only legal use of its return? Why does its
name start with "is"?

+static unsigned isAllocationCall(const Value *V, AllocType AllocTy,
+                                 bool LookThroughBitCast) {
+  Function *Callee = getCalledFunction(V, LookThroughBitCast);
+  if (!Callee)
+    return -1;
+
+  unsigned i = 0;
+  bool found = false;
+  for ( ; i < sizeof(AllocationFnData)/sizeof(*AllocationFnData); ++i) {

Please use i < array_lengthof(...) from ADT/STLExtras.h.

+    if (Callee->getName() == AllocationFnData[i].Name) {
+      found = true;
+      break;
+    }
+  }
+  if (!found)
+    return -1;
+
+  if ((AllocationFnData[i].AllocTy & AllocTy) == 0)
+    return -1;

+STATISTIC(ChecksUnableInterproc, "ObjectSizeAnalysis: unable interproc");
+STATISTIC(ChecksUnableLoad, "ObjectSizeAnalysis: unable LoadInst");

Please use full text in the statistic descriptions, see the others for
example.

+#define GET_VALUE(Val, Int) \
+  if (!Val) \
+    Val = ConstantInt::get(IntTy, Int)

Please don't use macros. We really hate macros. Hopefully you can structure
the code so that you won't need this?



Nick

On 18 June 2012 14:19, Nuno Lopes <nunoplopes at sapo.pt> wrote:

> Hi,
>
> Right now, there are multiple implementations of analysis of memory
> allocation functions across the LLVM codebase, including:
>  - computing size of pointed objects (dead store elimination, instcombine,
> basic alias analysis)
>  - recognize allocation functions (e.g., to know that the return value
> doesn't alias anything else, or to recognize uninitialized allocated memory
> for undef purposes).
>
> Therefore I send in attach a patch to refactor the MemoryBuiltin Analysis
> to include all the functionality needed by the other passes just mentioned.
> It removes a lot of code duplication, and the proposed implementation is
> better than all the previous ones combined. The patch adds support for the
> noalias function attribute as well.
> This patch also adds support for computing the size of pointed objects at
> run-time, which is basically a copy from the bounds checking pass. After
> this patch goes in, most of the code in the bounds checking pass can be
> removed.
>
> Any comments/suggestions/etc..?
>
> Thanks,
> Nuno
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120618/470efa07/attachment.html>


More information about the llvm-commits mailing list