[llvm-commits] [PATCH] New pass: pointer (bounds) tracking!

Wed Jul 8 13:43:35 PDT 2009

Hi Edwin,

On Jul 2, 2009, at 12:32 PM, Török Edwin wrote:

> Hi,
>
> The attached patch introduces a new pass that can be queried for
> allocation size  of pointers,
> and contains a simple solver [*] for (a ult b) inequalities useful in
> bounds checking. This is based on the bounds checker tool I developed
> for my undergraduate thesis.

This sounds cool. Thanks for your patience; I'm finally getting around
to taking a look.

>
> If this pass is something that would be accepted in LLVM, I will  
> cleanup
> the code to conform to LLVM coding style, add testcases, and resubmit
> the patch. I have just attached the patch now as a preview.
>
> I've seen that there is a lot of work being done to support SSI/ 
> ABCD. I
> think my approach is complementary: I try to determine whether the
> existing bounds checks are sufficient for the safety of a memory  
> access,
> and not whether any of the bounds checks are redundant and can be  
> removed.

It may be that the code in your pass could be used for both purposes.
It will be interesting to compare different approaches.

>
> I haven't made up  by mind whether all this should be in one pass,  
> or I
> should split it up further:
> PointerTracking for size information and bounds checking, and  
> SCEVSolver
> for the SCEV inequality solver?
> Or maybe the SCEVSolver parts should be integrated directly into
> ScalarEvolution?

It looks like most of this code could be integrated into  
ScalarEvolution,
and it would help out a number of important optimizations.  However, I
think it'd be fine if you wanted to check it in as a separate analysis
first and do the integration in separate steps. There are some tricky
areas, and this would make it easier to see what's being changed.

>
> [*]
> The solver uses isLoopGuardedByCond(), but is able to handle more
> complicated situations by using
> properties of SCEV expressions, and transitivity of ULT/ULE:
> - if a SCEVAddRecExpr is monotonic it checks whether the start and  
> exit
> value is known to be ULT Limit.

This is cool :-).

> - for PHI nodes that are SCEVUnknowns it checks whether all parts of
> the PHI are known to be ULT
> - if a value is a constant offset away from a PHI: C + X <u B -> X  
> <u B-C
> It also looks for predicates among the basic blocks that dominate the
> memory access, trying to
> find a predicate that is sufficient for the validity of the memory  
> access.
>
> Best regards,
> --Edwin
>
> <pointertracking.patch>

Here are some comments on the patch.

 > namespace llvm {
 >   char PointerTracking::ID=0;
 >
 >   [...]

Don't indent namespace contents.

 >   void PointerTracking::getAnalysisUsage(AnalysisUsage &AU) const {
 >     AU.addRequired<DominatorTree>();
 >     AU.addRequired<LoopInfo>();
 >     AU.addRequired<ScalarEvolution>();

These should be addRequiredTransitive instead of addRequired, so that  
the
required analyses are preserved throughout the lifetime of the
PointerTracking analysis.

 >    callocFunc = M.getFunction("calloc");
[...]
 >    reallocFunc = M.getFunction("realloc");

I don't know what the current prevailing fasion is with respect to
passes knowing special things about functions with standard C library
names. It would be good to mention that this pass knows about
"calloc" and "realloc" in a high-level comment though.

 >   const SCEV *PointerTracking::getAllocationSize(Value *V)

This name is a little confusing. Can you rename this to
getAllocationElementCount or something?

 >   {
 >     if (AllocationInst *AI = dyn_cast<AllocationInst>(V)) {
 >       Value *arraySize = AI->getArraySize();
 >       if (ConstantInt *C = dyn_cast<ConstantInt>(arraySize)) {
 >         return SE->getConstant(C);
 >       }
 >       return SE->getSCEVAtScope(arraySize, LI->getLoopFor(AI- 
 >getParent()));

It isn't actually necessary to check for a ConstantInt specially
here. getSCEVAtScope will effectively do this automatically.

 >     }
 >     if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V)) {
 >       if (GV->hasInitializer()) {
 >         Constant *C = GV->getInitializer();
 >         if (const ArrayType *ATy = dyn_cast<ArrayType>(C->getType 
())) {
 >           return SE->getConstant(Type::Int32Ty, ATy->getNumElements 
());

Hard-coding Int32Ty here may be problematic in some cases.

 >         }
 >       }
 >       return SE->getConstant(Type::Int32Ty, 1);
 >     }
 >     // TODO: implement more complicated pointer size tracking
 >     return 0;
 >   }

 >   const SCEV *PointerTracking::getAllocationSizeInBytes(Value *V)
 >   {
 >     assert(TD && "TargetData must be available to calculate size in  
bytes!");

Would it make sense to postpone this check until TD is actually
needed, and then return CouldNotCompute if TD is unavailable?

 >     const SCEV *S = getAllocationSize(V);
 >     if (isa<SCEVCouldNotCompute>(S)) {
 >       if (CallInst *CI = dyn_cast<CallInst>(V)) {
 >         CallSite CS(CI);
 >         Function *F = dyn_cast<Function>(CS.getCalledValue()- 
 >stripPointerCasts());

This line exceeds 80 columns.

 >         const Loop *L = LI->getLoopFor(CI->getParent());
 >         if (F == callocFunc) {
 >           return
 >             SE->getSCEVAtScope(SE->getMulExpr(SE->getSCEV 
(CS.getArgument(0)),
 >                                               SE->getSCEV 
(CS.getArgument(1))),
 >                                L);
 >         } else if (F == reallocFunc) {
 >           return SE->getSCEVAtScope(CS.getArgument(1), L);
 >         }
 >       }
 >       return S;
 >     }
 >     uint64_t elementsize = TD->getTypeStoreSize(V->getType());
 >     return SE->getMulExpr(SE->getConstant(S->getType(),  
elementsize), S);

The size of the allocation will be the number of elements multiplied
by the "Alloc" size, not the "Store" size.

 >  const SCEV *getStart(const SCEV *A, const Loop *L)
 >  {

Here and elsewhere, the brace for a function definition goes on the
same line as the close paren.

 >    if (isa<SCEVConstant>(A))
 >      return A;

This would be a little stronger as A->isLoopInvariant(L).

 >     DomTreeNodeBase<BasicBlock> *N = DT->getNode(BB);
 >     DomTreeNodeBase<BasicBlock> *D = N->getIDom();
 >     while (D) { // D dominates N
 >       BasicBlock *dBB = D->getBlock();
 >       bool negated;
 >       Value *V = getConditionToReach(dBB, BB, negated);
 >       if (conditionSufficient(V, negated, I, ICmpInst::ICMP_ULT,  
Limit))
 >         return AlwaysTrue;
 >       BB = dBB;
 >       D = D->getIDom();
 >     }

This doesn't appear correct. Even if D dominates N and has a
conditional branch with a successor that reaches N, it's possible
that the other successor of the conditional branch reaches N also.
See ScalarEvolution's getPredecessorWithUniqueSuccessorForBB for
code that solves a similar problem.

Dan