<div style="font-family:arial,helvetica,sans-serif"><font><div>Thanks for taking this on, as you've noticed LLVM needs cleanup here. I didn't make it all the way through your patch, but I have many comments on it. I think your patch will need some substantial work before it can go in, but I'm happy with the changes to the analysis/transform passes!</div>


<div><br></div><div>+#include "llvm/ADT/DenseMap.h"</div><div>+#include "llvm/Support/DataTypes.h"</div><div>+#include "llvm/Support/IRBuilder.h"</div><div>+#include "llvm/ADT/SmallPtrSet.h"</div>


<div>+#include "llvm/Support/TargetFolder.h"</div><div><br></div><div>Please sort.</div><div><br></div><div><div>+/// isAllocationFn - returns true if V is a call to an allocation function</div><div>+/// (either malloc, calloc, realloc, or strdup like).</div>


<div>+bool isAllocationFn(const Value *V, bool LookThroughBitCast = false);</div></div><div><br></div><div>Don't reuse "allocation function" in its definition, instead try something like "V is a call to a function that allocates memory".</div>


<div><br></div><div><div>+/// isAllocLikeFn - returns true if V is a call to a malloc, calloc, or strdup</div><div>+/// like function.</div><div>+bool isAllocLikeFn(const Value *V, bool LookThroughBitCast = false);</div>

</div>

<div><br></div><div>You've got to include some more detail in the doxystrings here to differentiate between isAllocationFn, isAllocLikeFn and isMallocLikeFn!</div><div><br></div><div><div>+/// isCallocLikeFn - returns true if V is a call to a calloc-like function.</div>


<div>+bool isCallocLikeFn(const Value *V, bool LookThroughBitCast = false);</div></div><div><br></div><div>Similarly; calloc-like means that the returned memory is zero-initialized?</div><div><br></div><div><div>+// FIXME: can use unions here to save space</div>


<div>+struct CacheData {</div><div>+  APInt Offset;</div><div>+  Value *OffsetValue;</div><div>+  APInt Size;</div><div>+  Value *SizeValue;</div><div>+  bool ReturnVal;</div><div>+  CacheData() {}</div><div>+  CacheData(APInt Off, Value *OffVal, APInt Sz, Value *SzVal, bool Ret) :</div>


<div>+    Offset(Off), OffsetValue(OffVal), Size(Sz), SizeValue(SzVal),</div><div>+    ReturnVal(Ret) {}</div><div>+};</div></div><div><br></div><div>What are these? Each member should have a comment. Are "APInt <X>" and "Value *<X>Value" mutually exclusive? If so, why have a constructor that specifies all of them? Also, what does it really mean to construct the empty one, besides the fact that you can insert it into a DenseMap?</div>


<div><br></div><div>Also, put the ":" of the initialize list on the same line as the first variable being initialized.</div><div><br></div><div><div>+struct ObjectSizeAnalysis {</div><div>+  /// initialize analysis. If ConstOnly is false, then the APIs of this analysis</div>


</div><div><br></div><div>Use "class" since it does have private members. Capitalize "Initialize" as the start of the sentence.</div><div><br></div><div>I think this should really be two pieces, one which determines the size of things which can be done without modifying the code and a wrapper around that which builds up IR when that fails. Here's roughly what I have in mind:</div>


<div><br></div><div>  typedef SizeOffsetType std::pair<Value *, Value *>;</div><div>  class ObjectSizeOffsetVisitor : public InstVisitor<ObjectSizeOffsetVisitor, SizeOffsetType> {</div><div>  public:</div><div>


    // ... </div><div>    SizeOffsetType visitSelectInst(SelectInst &SI) {</div><div>      SizeOffset TrueSide = get(SI.getTrueValue());</div><div>      SizeOffset FalseSide = get(SI.getFalseValue());</div><div>      if (TrueSide == FalseSide) {</div>


<div>        SizeOffset = make_pair(TrueSide, SizeOffset.second);</div><div>      }</div><div>    }</div><div>    // ...</div><div>  private:</div><div>    SizeOffsetType get(Value *) { /*check cache, call this->visit(Value *) on miss*/ }</div>


<div>    // ...</div><div>    SizeOffsetType SizeOffset;  // initialize to (NULL, NULL)</div><div> };</div><div><br></div><div>class ObjectSizeEvaluator : public InstVisitor<ObjectSizeEvaluator, std::pair<Value *, Value *> > {</div>


<div>  // ...</div><div>  SizeOffsetType visitSelectInst(SelectInst &SI) {</div><div>    SizeOffsetType SizeOffset = V.visitSelectInst(SI);</div><div>    if (!SizeOffset.first || !SizeOffset.second) {</div><div>      SizeOffsetType TrueSide = this->get(SI.getTrueValue());</div>


<div>      SizeOffsetType LeftSide = this->get(SI.getTrueValue());</div><div>      if (!SizeOffset.first)</div><div>        Builder.CreateSelect(SI->getCondition(), TrueSide.first, FalseSide.first);</div><div>      if (!SizeOffset.second)</div>


<div>        Builder.CreateSelect(SI->getCondition(), TrueSide.second, FalseSide.second);</div><div>    }</div><div>  }</div><div>  // ...</div><div>  ObjectSizeOffsetVisitor V;</div><div>  IRBuilder Builder;</div><div>


}</div><div><br></div><div>Does that make sense? You'll need to implement your own visit() method that handles Arguments and GEP constantexpr's and such, but instructions should go through the existing InstVisitor interface.</div>


<div><br></div><div><div>+  ObjectSizeAnalysis(const TargetData *_TD,</div><div>+                     LLVMContext &_Context,</div><div>+                     bool _ConstOnly,</div><div>+                     bool _RoundToAlign = false) :</div>


</div><div><br></div><div>Please don't use leading underscores! That's reserved. I personally like to just use the exact same names as the members (yes, the scoping rules work out such that "TD(TD)" will initialize TD the member from TD the constructor argument) but if you don't like that place the underscores on the end.</div>


<div><br></div><div><div>Index: lib/Analysis/IPA/GlobalsModRef.cpp</div><div>===================================================================</div><div>--- lib/Analysis/IPA/GlobalsModRef.cpp<span class="Apple-tab-span" style="white-space:pre">  </span>(revision 158666)</div>


<div>+++ lib/Analysis/IPA/GlobalsModRef.cpp<span class="Apple-tab-span" style="white-space:pre">        </span>(working copy)</div><div>@@ -329,15 +329,8 @@</div><div>       // Check the value being stored.</div><div>       Value *Ptr = GetUnderlyingObject(SI->getOperand(0));</div>


<div> </div><div>-      if (isMalloc(Ptr)) {</div><div>-        // Okay, easy case.</div><div>-      } else if (CallInst *CI = dyn_cast<CallInst>(Ptr)) {</div><div>-        Function *F = CI->getCalledFunction();</div>


<div>-        if (!F || !F->isDeclaration()) return false;     // Too hard to analyze.</div><div>-        if (F->getName() != "calloc") return false;   // Not calloc.</div><div>-      } else {</div><div>+      if (!isNoAliasFn(Ptr))</div>


<div>         return false;  // Too hard to analyze.</div><div>-      }</div><div><br></div></div><div>I'm not sure this is right.. There's another assumption that globals mod-ref is making which isn't in the comments; it's assuming that the called functions (malloc and calloc) have no side-effects other than the allocation of memory. In general, the callee needs to mark "FunctionInfo[Writers[i]].GlobalInfo[I] |= Mod" when isNoAlias but can remove that marking when it really is malloc/calloc.</div>


<div><br></div><div>+enum AllocType {</div><div><div>+  MallocLike         = 1<<0, // allocates</div><div>+  CallocLike         = 1<<1, // allocates + bzero</div><div>+  ReallocLike        = 1<<2, // reallocates</div>


<div>+  StrDupLike         = 1<<3,</div><div>+  AllocLike          = MallocLike | CallocLike | StrDupLike,</div><div>+  AnyAlloc           = MallocLike | CallocLike | ReallocLike | StrDupLike</div><div>+};</div></div>


<div><br></div><div>I don't understand why StrDup gets special handling? "fopen" also returns newly allocated memory. What they have in common is the noalias marker. Are you using this like calloc to indicate that the memory is initialized to a copy of other memory? Please add comments.</div>


<div><br></div><div><div>+/// isAllocationCall - returns the index of the AllocationFnData array if V is</div><div>+/// a call to a known allocation function, and -1 otherwise.</div><div><br></div><div>This is a very strange function. Why not return a pointer to the array element or NULL if that's the only legal use of its return? Why does its name start with "is"?</div>


<div><br></div><div>+static unsigned isAllocationCall(const Value *V, AllocType AllocTy,</div><div>+                                 bool LookThroughBitCast) {</div><div>+  Function *Callee = getCalledFunction(V, LookThroughBitCast);</div>


<div>+  if (!Callee)</div><div>+    return -1;</div><div>+</div><div>+  unsigned i = 0;</div><div>+  bool found = false;</div><div>+  for ( ; i < sizeof(AllocationFnData)/sizeof(*AllocationFnData); ++i) {</div><div><br>


</div><div>Please use i < array_lengthof(...) from ADT/STLExtras.h.</div><div><br></div><div>+    if (Callee->getName() == AllocationFnData[i].Name) {</div><div>+      found = true;</div><div>+      break;</div><div>


+    }</div><div>+  }</div><div>+  if (!found)</div><div>+    return -1;</div><div>+</div><div>+  if ((AllocationFnData[i].AllocTy & AllocTy) == 0)</div><div>+    return -1;</div></div><div><br></div><div><div>+STATISTIC(ChecksUnableInterproc, "ObjectSizeAnalysis: unable interproc");</div>


<div>+STATISTIC(ChecksUnableLoad, "ObjectSizeAnalysis: unable LoadInst");</div></div><div><br></div><div>Please use full text in the statistic descriptions, see the others for example.</div><div><br></div><div>

<div>

+#define GET_VALUE(Val, Int) \</div><div>+  if (!Val) \</div><div>+    Val = ConstantInt::get(IntTy, Int)</div></div><div><br></div><div>Please don't use macros. We really hate macros. Hopefully you can structure the code so that you won't need this?</div>


<div><br></div><div><br></div><div><br></div><div>Nick</div><div><br></div><div>On 18 June 2012 14:19, Nuno Lopes <span dir="ltr"><<a href="mailto:nunoplopes@sapo.pt" target="_blank">nunoplopes@sapo.pt</a>></span> wrote:</div>


<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

Right now, there are multiple implementations of analysis of memory allocation functions across the LLVM codebase, including:<br>

 - computing size of pointed objects (dead store elimination, instcombine, basic alias analysis)<br>

 - recognize allocation functions (e.g., to know that the return value doesn't alias anything else, or to recognize uninitialized allocated memory for undef purposes).<br>

<br>

Therefore I send in attach a patch to refactor the MemoryBuiltin Analysis to include all the functionality needed by the other passes just mentioned. It removes a lot of code duplication, and the proposed implementation is better than all the previous ones combined. The patch adds support for the noalias function attribute as well.<br>


This patch also adds support for computing the size of pointed objects at run-time, which is basically a copy from the bounds checking pass. After this patch goes in, most of the code in the bounds checking pass can be removed.<br>


<br>

Any comments/suggestions/etc..?<br>

<br>

Thanks,<br>

Nuno<br>

<br>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div><br></font></div>