[llvm-commits] [PATCH] New pass: pointer (bounds) tracking!

Mon Jul 13 13:59:56 PDT 2009

On 2009-07-08 23:59, Török Edwin wrote:
> On 2009-07-08 23:43, Dan Gohman wrote:
>   
>> Hi Edwin,
>>
>> On Jul 2, 2009, at 12:32 PM, Török Edwin wrote:
>>
>>
>>   
>>     
>>> Hi,
>>>
>>> The attached patch introduces a new pass that can be queried for
>>> allocation size  of pointers,
>>> and contains a simple solver [*] for (a ult b) inequalities useful in
>>> bounds checking. This is based on the bounds checker tool I developed
>>> for my undergraduate thesis.
>>>     
>>>       
>> This sounds cool. Thanks for your patience; I'm finally getting around
>> to taking a look.
>>
>>   
>>     
>
> Hi Dan,
>
> I am replying to part of your email now, next part will come tomorrow.
>   

Please review the attached is pointertracking1.patch, which should be
ready to be applied to LLVM.

I also attached pointertracking_notready.patch, which  is a WIP for
checkLimits, I'll submit it again once pointertracking1.patch is applied
to LLVM.
This is not ready to be reviewed at this time, I still have some
unimplemented parts there, and it needs more testing.

>>   
>>     
>>> [*]
>>> The solver uses isLoopGuardedByCond(), but is able to handle more
>>> complicated situations by using
>>> properties of SCEV expressions, and transitivity of ULT/ULE:
>>> - if a SCEVAddRecExpr is monotonic it checks whether the start and  
>>> exit
>>> value is known to be ULT Limit.
>>>     
>>>       
>> This is cool :-).
>>
>>     

I moved this to pointertracking_notready.patch, I have some more
(unimplemented) ideas for checking
monotonicity, and limits, based on derivating AddRecExpr ;)

>>   
>>     
>>> - for PHI nodes that are SCEVUnknowns it checks whether all parts of
>>> the PHI are known to be ULT
>>> - if a value is a constant offset away from a PHI: C + X <u B -> X  
>>> <u B-C
>>> It also looks for predicates among the basic blocks that dominate the
>>> memory access, trying to
>>> find a predicate that is sufficient for the validity of the memory  
>>> access.
>>>
>>> Best regards,
>>> --Edwin
>>>
>>> <pointertracking.patch>
>>>     
>>>       
>> Here are some comments on the patch.
>>
>>  > namespace llvm {
>>  >   char PointerTracking::ID=0;
>>  >
>>  >   [...]
>>
>> Don't indent namespace contents.
>>
>>  >   void PointerTracking::getAnalysisUsage(AnalysisUsage &AU) const {
>>  >     AU.addRequired<DominatorTree>();
>>  >     AU.addRequired<LoopInfo>();
>>  >     AU.addRequired<ScalarEvolution>();
>>
>> These should be addRequiredTransitive instead of addRequired, so that  
>> the
>> required analyses are preserved throughout the lifetime of the
>> PointerTracking analysis.
>>
>>  >    callocFunc = M.getFunction("calloc");
>> [...]
>>  >    reallocFunc = M.getFunction("realloc");
>>
>> I don't know what the current prevailing fasion is with respect to
>> passes knowing special things about functions with standard C library
>> names. It would be good to mention that this pass knows about
>> "calloc" and "realloc" in a high-level comment though.
>>   
>>     
>
> Ok.
>
>   
>>  >   const SCEV *PointerTracking::getAllocationSize(Value *V)
>>
>> This name is a little confusing. Can you rename this to
>> getAllocationElementCount or something?
>>   
>>     
>
> Ok.
>   

I also refactored the internals of this method. The implementation
should be more straight forward now:
 - we request to know the element count of a given Type (not necessarely
same as allocated type)
 - we explicitly get the type of allocated element
 - we check the size of the 2 type, if possible we convert the count
without using TargetData
- if necessary I use TargetData to convert count

Now getAllocationSizeInBytes simply asks for element count with
Type::Int8Ty.

>   
>>  >   {
>>  >     if (AllocationInst *AI = dyn_cast<AllocationInst>(V)) {
>>  >       Value *arraySize = AI->getArraySize();
>>  >       if (ConstantInt *C = dyn_cast<ConstantInt>(arraySize)) {
>>  >         return SE->getConstant(C);
>>  >       }
>>  >       return SE->getSCEVAtScope(arraySize, LI->getLoopFor(AI- 
>>  >getParent()));
>>
>> It isn't actually necessary to check for a ConstantInt specially
>> here. getSCEVAtScope will effectively do this automatically.
>>     

Fixed, I don't use SCEVAtScope here anymore either.

>>  >     }
>>  >     if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V)) {
>>  >       if (GV->hasInitializer()) {
>>  >         Constant *C = GV->getInitializer();
>>  >         if (const ArrayType *ATy = dyn_cast<ArrayType>(C->getType 
>> ())) {
>>  >           return SE->getConstant(Type::Int32Ty, ATy->getNumElements 
>> ());
>>
>> Hard-coding Int32Ty here may be problematic in some cases.
>>     

I think in general I can't guarantee what the type of the returned SCEV
is, the client
should convert it to a compatible type when using it  (and I'm doing
some conversion in compareSCEV too).
>>  >         }
>>  >       }
>>  >       return SE->getConstant(Type::Int32Ty, 1);
>>  >     }
>>  >     // TODO: implement more complicated pointer size tracking
>>  >     return 0;
>>  >   }
>>
>>  >   const SCEV *PointerTracking::getAllocationSizeInBytes(Value *V)
>>  >   {
>>  >     assert(TD && "TargetData must be available to calculate size in  
>> bytes!");
>>
>> Would it make sense to postpone this check until TD is actually
>> needed, and then return CouldNotCompute if TD is unavailable?
>>   
>>     
>
> Yes.
>   

Done, I also calculate counts using getPrimitiveSizeInBits() if
available, and only after that try TargetData (if available).

>>  >     const SCEV *S = getAllocationSize(V);
>>  >     if (isa<SCEVCouldNotCompute>(S)) {
>>  >       if (CallInst *CI = dyn_cast<CallInst>(V)) {
>>  >         CallSite CS(CI);
>>  >         Function *F = dyn_cast<Function>(CS.getCalledValue()- 
>>  >stripPointerCasts());
>>
>> This line exceeds 80 columns.
>>
>>  >         const Loop *L = LI->getLoopFor(CI->getParent());
>>  >         if (F == callocFunc) {
>>  >           return
>>  >             SE->getSCEVAtScope(SE->getMulExpr(SE->getSCEV 
>> (CS.getArgument(0)),
>>  >                                               SE->getSCEV 
>> (CS.getArgument(1))),
>>  >                                L);
>>  >         } else if (F == reallocFunc) {
>>  >           return SE->getSCEVAtScope(CS.getArgument(1), L);
>>  >         }
>>  >       }
>>  >       return S;
>>  >     }
>>  >     uint64_t elementsize = TD->getTypeStoreSize(V->getType());
>>  >     return SE->getMulExpr(SE->getConstant(S->getType(),  
>> elementsize), S);
>>
>> The size of the allocation will be the number of elements multiplied
>> by the "Alloc" size, not the "Store" size.
>>   
>>     
>
> Thanks, will fix.
>
>   
>>  >  const SCEV *getStart(const SCEV *A, const Loop *L)
>>  >  {
>>
>> Here and elsewhere, the brace for a function definition goes on the
>> same line as the close paren.
>>   
>>     
>
> Ok.
>   
>>  >    if (isa<SCEVConstant>(A))
>>  >      return A;
>>
>> This would be a little stronger as A->isLoopInvariant(L).
>>
>>  >     DomTreeNodeBase<BasicBlock> *N = DT->getNode(BB);
>>  >     DomTreeNodeBase<BasicBlock> *D = N->getIDom();
>>  >     while (D) { // D dominates N
>>  >       BasicBlock *dBB = D->getBlock();
>>  >       bool negated;
>>  >       Value *V = getConditionToReach(dBB, BB, negated);
>>  >       if (conditionSufficient(V, negated, I, ICmpInst::ICMP_ULT,  
>> Limit))
>>  >         return AlwaysTrue;
>>  >       BB = dBB;
>>  >       D = D->getIDom();
>>  >     }
>>
>> This doesn't appear correct. Even if D dominates N and has a
>> conditional branch with a successor that reaches N, it's possible
>> that the other successor of the conditional branch reaches N also.
>> See ScalarEvolution's getPredecessorWithUniqueSuccessorForBB for
>> code that solves a similar problem.
>>   
>>     
>
> That is quite a serious bug in my dominator predicate handling then,
> I'll have to be more careful.
> I'll have a look at this tomorrow, and try to post an updated patch.
>   

I fixed this in pointertracking_notready.patch by checking that:
- there is one and only one successor  of the branch that dominates the
target BB, excluding backedges
-  all predecessors of the targetBB are dominated by that successor

I still need to add testcases for this, and review all the methods in
pointertracking_notready.patch once more.

Best regards,
--Edwin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pointertracking1.patch
Type: text/x-diff
Size: 20620 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20090713/fb086b3f/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pointertracking_notready.patch
Type: text/x-diff
Size: 12264 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20090713/fb086b3f/attachment-0001.patch>