[llvm-commits] [PATCH] New pass: pointer (bounds) tracking!

Wed Jul 8 13:59:24 PDT 2009

On 2009-07-08 23:43, Dan Gohman wrote:
> Hi Edwin,
>
> On Jul 2, 2009, at 12:32 PM, Török Edwin wrote:
>
>
>   
>> Hi,
>>
>> The attached patch introduces a new pass that can be queried for
>> allocation size  of pointers,
>> and contains a simple solver [*] for (a ult b) inequalities useful in
>> bounds checking. This is based on the bounds checker tool I developed
>> for my undergraduate thesis.
>>     
>
> This sounds cool. Thanks for your patience; I'm finally getting around
> to taking a look.
>
>   

Hi Dan,

I am replying to part of your email now, next part will come tomorrow.

>> If this pass is something that would be accepted in LLVM, I will  
>> cleanup
>> the code to conform to LLVM coding style, add testcases, and resubmit
>> the patch. I have just attached the patch now as a preview.
>>
>> I've seen that there is a lot of work being done to support SSI/ 
>> ABCD. I
>> think my approach is complementary: I try to determine whether the
>> existing bounds checks are sufficient for the safety of a memory  
>> access,
>> and not whether any of the bounds checks are redundant and can be  
>> removed.
>>     
>
> It may be that the code in your pass could be used for both purposes.
> It will be interesting to compare different approaches.
>   

Cool.

>   
>> I haven't made up  by mind whether all this should be in one pass,  
>> or I
>> should split it up further:
>> PointerTracking for size information and bounds checking, and  
>> SCEVSolver
>> for the SCEV inequality solver?
>> Or maybe the SCEVSolver parts should be integrated directly into
>> ScalarEvolution?
>>     
>
> It looks like most of this code could be integrated into  
> ScalarEvolution,
> and it would help out a number of important optimizations.  However, I
> think it'd be fine if you wanted to check it in as a separate analysis
> first and do the integration in separate steps. There are some tricky
> areas, and this would make it easier to see what's being changed.
>   

I'll keep it separate from ScalarEvolution if I can.

>   
>> [*]
>> The solver uses isLoopGuardedByCond(), but is able to handle more
>> complicated situations by using
>> properties of SCEV expressions, and transitivity of ULT/ULE:
>> - if a SCEVAddRecExpr is monotonic it checks whether the start and  
>> exit
>> value is known to be ULT Limit.
>>     
>
> This is cool :-).
>
>   
>> - for PHI nodes that are SCEVUnknowns it checks whether all parts of
>> the PHI are known to be ULT
>> - if a value is a constant offset away from a PHI: C + X <u B -> X  
>> <u B-C
>> It also looks for predicates among the basic blocks that dominate the
>> memory access, trying to
>> find a predicate that is sufficient for the validity of the memory  
>> access.
>>
>> Best regards,
>> --Edwin
>>
>> <pointertracking.patch>
>>     
>
> Here are some comments on the patch.
>
>  > namespace llvm {
>  >   char PointerTracking::ID=0;
>  >
>  >   [...]
>
> Don't indent namespace contents.
>
>  >   void PointerTracking::getAnalysisUsage(AnalysisUsage &AU) const {
>  >     AU.addRequired<DominatorTree>();
>  >     AU.addRequired<LoopInfo>();
>  >     AU.addRequired<ScalarEvolution>();
>
> These should be addRequiredTransitive instead of addRequired, so that  
> the
> required analyses are preserved throughout the lifetime of the
> PointerTracking analysis.
>
>  >    callocFunc = M.getFunction("calloc");
> [...]
>  >    reallocFunc = M.getFunction("realloc");
>
> I don't know what the current prevailing fasion is with respect to
> passes knowing special things about functions with standard C library
> names. It would be good to mention that this pass knows about
> "calloc" and "realloc" in a high-level comment though.
>   

Ok.

>  >   const SCEV *PointerTracking::getAllocationSize(Value *V)
>
> This name is a little confusing. Can you rename this to
> getAllocationElementCount or something?
>   

Ok.

>  >   {
>  >     if (AllocationInst *AI = dyn_cast<AllocationInst>(V)) {
>  >       Value *arraySize = AI->getArraySize();
>  >       if (ConstantInt *C = dyn_cast<ConstantInt>(arraySize)) {
>  >         return SE->getConstant(C);
>  >       }
>  >       return SE->getSCEVAtScope(arraySize, LI->getLoopFor(AI- 
>  >getParent()));
>
> It isn't actually necessary to check for a ConstantInt specially
> here. getSCEVAtScope will effectively do this automatically.
>
>  >     }
>  >     if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V)) {
>  >       if (GV->hasInitializer()) {
>  >         Constant *C = GV->getInitializer();
>  >         if (const ArrayType *ATy = dyn_cast<ArrayType>(C->getType 
> ())) {
>  >           return SE->getConstant(Type::Int32Ty, ATy->getNumElements 
> ());
>
> Hard-coding Int32Ty here may be problematic in some cases.
>
>  >         }
>  >       }
>  >       return SE->getConstant(Type::Int32Ty, 1);
>  >     }
>  >     // TODO: implement more complicated pointer size tracking
>  >     return 0;
>  >   }
>
>  >   const SCEV *PointerTracking::getAllocationSizeInBytes(Value *V)
>  >   {
>  >     assert(TD && "TargetData must be available to calculate size in  
> bytes!");
>
> Would it make sense to postpone this check until TD is actually
> needed, and then return CouldNotCompute if TD is unavailable?
>   

Yes.
>  >     const SCEV *S = getAllocationSize(V);
>  >     if (isa<SCEVCouldNotCompute>(S)) {
>  >       if (CallInst *CI = dyn_cast<CallInst>(V)) {
>  >         CallSite CS(CI);
>  >         Function *F = dyn_cast<Function>(CS.getCalledValue()- 
>  >stripPointerCasts());
>
> This line exceeds 80 columns.
>
>  >         const Loop *L = LI->getLoopFor(CI->getParent());
>  >         if (F == callocFunc) {
>  >           return
>  >             SE->getSCEVAtScope(SE->getMulExpr(SE->getSCEV 
> (CS.getArgument(0)),
>  >                                               SE->getSCEV 
> (CS.getArgument(1))),
>  >                                L);
>  >         } else if (F == reallocFunc) {
>  >           return SE->getSCEVAtScope(CS.getArgument(1), L);
>  >         }
>  >       }
>  >       return S;
>  >     }
>  >     uint64_t elementsize = TD->getTypeStoreSize(V->getType());
>  >     return SE->getMulExpr(SE->getConstant(S->getType(),  
> elementsize), S);
>
> The size of the allocation will be the number of elements multiplied
> by the "Alloc" size, not the "Store" size.
>   

Thanks, will fix.

>  >  const SCEV *getStart(const SCEV *A, const Loop *L)
>  >  {
>
> Here and elsewhere, the brace for a function definition goes on the
> same line as the close paren.
>   

Ok.
>  >    if (isa<SCEVConstant>(A))
>  >      return A;
>
> This would be a little stronger as A->isLoopInvariant(L).
>
>  >     DomTreeNodeBase<BasicBlock> *N = DT->getNode(BB);
>  >     DomTreeNodeBase<BasicBlock> *D = N->getIDom();
>  >     while (D) { // D dominates N
>  >       BasicBlock *dBB = D->getBlock();
>  >       bool negated;
>  >       Value *V = getConditionToReach(dBB, BB, negated);
>  >       if (conditionSufficient(V, negated, I, ICmpInst::ICMP_ULT,  
> Limit))
>  >         return AlwaysTrue;
>  >       BB = dBB;
>  >       D = D->getIDom();
>  >     }
>
> This doesn't appear correct. Even if D dominates N and has a
> conditional branch with a successor that reaches N, it's possible
> that the other successor of the conditional branch reaches N also.
> See ScalarEvolution's getPredecessorWithUniqueSuccessorForBB for
> code that solves a similar problem.
>   

That is quite a serious bug in my dominator predicate handling then,
I'll have to be more careful.
I'll have a look at this tomorrow, and try to post an updated patch.

Best regards,
--Edwin