[llvm-commits] [PATCH] New pass: pointer (bounds) tracking!
Török Edwin
edwintorok at gmail.com
Wed Jul 8 13:59:24 PDT 2009
On 2009-07-08 23:43, Dan Gohman wrote:
> Hi Edwin,
>
> On Jul 2, 2009, at 12:32 PM, Török Edwin wrote:
>
>
>
>> Hi,
>>
>> The attached patch introduces a new pass that can be queried for
>> allocation size of pointers,
>> and contains a simple solver [*] for (a ult b) inequalities useful in
>> bounds checking. This is based on the bounds checker tool I developed
>> for my undergraduate thesis.
>>
>
> This sounds cool. Thanks for your patience; I'm finally getting around
> to taking a look.
>
>
Hi Dan,
I am replying to part of your email now, next part will come tomorrow.
>> If this pass is something that would be accepted in LLVM, I will
>> cleanup
>> the code to conform to LLVM coding style, add testcases, and resubmit
>> the patch. I have just attached the patch now as a preview.
>>
>> I've seen that there is a lot of work being done to support SSI/
>> ABCD. I
>> think my approach is complementary: I try to determine whether the
>> existing bounds checks are sufficient for the safety of a memory
>> access,
>> and not whether any of the bounds checks are redundant and can be
>> removed.
>>
>
> It may be that the code in your pass could be used for both purposes.
> It will be interesting to compare different approaches.
>
Cool.
>
>> I haven't made up by mind whether all this should be in one pass,
>> or I
>> should split it up further:
>> PointerTracking for size information and bounds checking, and
>> SCEVSolver
>> for the SCEV inequality solver?
>> Or maybe the SCEVSolver parts should be integrated directly into
>> ScalarEvolution?
>>
>
> It looks like most of this code could be integrated into
> ScalarEvolution,
> and it would help out a number of important optimizations. However, I
> think it'd be fine if you wanted to check it in as a separate analysis
> first and do the integration in separate steps. There are some tricky
> areas, and this would make it easier to see what's being changed.
>
I'll keep it separate from ScalarEvolution if I can.
>
>> [*]
>> The solver uses isLoopGuardedByCond(), but is able to handle more
>> complicated situations by using
>> properties of SCEV expressions, and transitivity of ULT/ULE:
>> - if a SCEVAddRecExpr is monotonic it checks whether the start and
>> exit
>> value is known to be ULT Limit.
>>
>
> This is cool :-).
>
>
>> - for PHI nodes that are SCEVUnknowns it checks whether all parts of
>> the PHI are known to be ULT
>> - if a value is a constant offset away from a PHI: C + X <u B -> X
>> <u B-C
>> It also looks for predicates among the basic blocks that dominate the
>> memory access, trying to
>> find a predicate that is sufficient for the validity of the memory
>> access.
>>
>> Best regards,
>> --Edwin
>>
>> <pointertracking.patch>
>>
>
> Here are some comments on the patch.
>
> > namespace llvm {
> > char PointerTracking::ID=0;
> >
> > [...]
>
> Don't indent namespace contents.
>
> > void PointerTracking::getAnalysisUsage(AnalysisUsage &AU) const {
> > AU.addRequired<DominatorTree>();
> > AU.addRequired<LoopInfo>();
> > AU.addRequired<ScalarEvolution>();
>
> These should be addRequiredTransitive instead of addRequired, so that
> the
> required analyses are preserved throughout the lifetime of the
> PointerTracking analysis.
>
> > callocFunc = M.getFunction("calloc");
> [...]
> > reallocFunc = M.getFunction("realloc");
>
> I don't know what the current prevailing fasion is with respect to
> passes knowing special things about functions with standard C library
> names. It would be good to mention that this pass knows about
> "calloc" and "realloc" in a high-level comment though.
>
Ok.
> > const SCEV *PointerTracking::getAllocationSize(Value *V)
>
> This name is a little confusing. Can you rename this to
> getAllocationElementCount or something?
>
Ok.
> > {
> > if (AllocationInst *AI = dyn_cast<AllocationInst>(V)) {
> > Value *arraySize = AI->getArraySize();
> > if (ConstantInt *C = dyn_cast<ConstantInt>(arraySize)) {
> > return SE->getConstant(C);
> > }
> > return SE->getSCEVAtScope(arraySize, LI->getLoopFor(AI-
> >getParent()));
>
> It isn't actually necessary to check for a ConstantInt specially
> here. getSCEVAtScope will effectively do this automatically.
>
> > }
> > if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V)) {
> > if (GV->hasInitializer()) {
> > Constant *C = GV->getInitializer();
> > if (const ArrayType *ATy = dyn_cast<ArrayType>(C->getType
> ())) {
> > return SE->getConstant(Type::Int32Ty, ATy->getNumElements
> ());
>
> Hard-coding Int32Ty here may be problematic in some cases.
>
> > }
> > }
> > return SE->getConstant(Type::Int32Ty, 1);
> > }
> > // TODO: implement more complicated pointer size tracking
> > return 0;
> > }
>
> > const SCEV *PointerTracking::getAllocationSizeInBytes(Value *V)
> > {
> > assert(TD && "TargetData must be available to calculate size in
> bytes!");
>
> Would it make sense to postpone this check until TD is actually
> needed, and then return CouldNotCompute if TD is unavailable?
>
Yes.
> > const SCEV *S = getAllocationSize(V);
> > if (isa<SCEVCouldNotCompute>(S)) {
> > if (CallInst *CI = dyn_cast<CallInst>(V)) {
> > CallSite CS(CI);
> > Function *F = dyn_cast<Function>(CS.getCalledValue()-
> >stripPointerCasts());
>
> This line exceeds 80 columns.
>
> > const Loop *L = LI->getLoopFor(CI->getParent());
> > if (F == callocFunc) {
> > return
> > SE->getSCEVAtScope(SE->getMulExpr(SE->getSCEV
> (CS.getArgument(0)),
> > SE->getSCEV
> (CS.getArgument(1))),
> > L);
> > } else if (F == reallocFunc) {
> > return SE->getSCEVAtScope(CS.getArgument(1), L);
> > }
> > }
> > return S;
> > }
> > uint64_t elementsize = TD->getTypeStoreSize(V->getType());
> > return SE->getMulExpr(SE->getConstant(S->getType(),
> elementsize), S);
>
> The size of the allocation will be the number of elements multiplied
> by the "Alloc" size, not the "Store" size.
>
Thanks, will fix.
> > const SCEV *getStart(const SCEV *A, const Loop *L)
> > {
>
> Here and elsewhere, the brace for a function definition goes on the
> same line as the close paren.
>
Ok.
> > if (isa<SCEVConstant>(A))
> > return A;
>
> This would be a little stronger as A->isLoopInvariant(L).
>
> > DomTreeNodeBase<BasicBlock> *N = DT->getNode(BB);
> > DomTreeNodeBase<BasicBlock> *D = N->getIDom();
> > while (D) { // D dominates N
> > BasicBlock *dBB = D->getBlock();
> > bool negated;
> > Value *V = getConditionToReach(dBB, BB, negated);
> > if (conditionSufficient(V, negated, I, ICmpInst::ICMP_ULT,
> Limit))
> > return AlwaysTrue;
> > BB = dBB;
> > D = D->getIDom();
> > }
>
> This doesn't appear correct. Even if D dominates N and has a
> conditional branch with a successor that reaches N, it's possible
> that the other successor of the conditional branch reaches N also.
> See ScalarEvolution's getPredecessorWithUniqueSuccessorForBB for
> code that solves a similar problem.
>
That is quite a serious bug in my dominator predicate handling then,
I'll have to be more careful.
I'll have a look at this tomorrow, and try to post an updated patch.
Best regards,
--Edwin
More information about the llvm-commits
mailing list