[llvm-commits] [PATCH] New pass: pointer (bounds) tracking!
Dan Gohman
gohman at apple.com
Wed Jul 8 13:43:35 PDT 2009
Hi Edwin,
On Jul 2, 2009, at 12:32 PM, Török Edwin wrote:
> Hi,
>
> The attached patch introduces a new pass that can be queried for
> allocation size of pointers,
> and contains a simple solver [*] for (a ult b) inequalities useful in
> bounds checking. This is based on the bounds checker tool I developed
> for my undergraduate thesis.
This sounds cool. Thanks for your patience; I'm finally getting around
to taking a look.
>
> If this pass is something that would be accepted in LLVM, I will
> cleanup
> the code to conform to LLVM coding style, add testcases, and resubmit
> the patch. I have just attached the patch now as a preview.
>
> I've seen that there is a lot of work being done to support SSI/
> ABCD. I
> think my approach is complementary: I try to determine whether the
> existing bounds checks are sufficient for the safety of a memory
> access,
> and not whether any of the bounds checks are redundant and can be
> removed.
It may be that the code in your pass could be used for both purposes.
It will be interesting to compare different approaches.
>
> I haven't made up by mind whether all this should be in one pass,
> or I
> should split it up further:
> PointerTracking for size information and bounds checking, and
> SCEVSolver
> for the SCEV inequality solver?
> Or maybe the SCEVSolver parts should be integrated directly into
> ScalarEvolution?
It looks like most of this code could be integrated into
ScalarEvolution,
and it would help out a number of important optimizations. However, I
think it'd be fine if you wanted to check it in as a separate analysis
first and do the integration in separate steps. There are some tricky
areas, and this would make it easier to see what's being changed.
>
> [*]
> The solver uses isLoopGuardedByCond(), but is able to handle more
> complicated situations by using
> properties of SCEV expressions, and transitivity of ULT/ULE:
> - if a SCEVAddRecExpr is monotonic it checks whether the start and
> exit
> value is known to be ULT Limit.
This is cool :-).
> - for PHI nodes that are SCEVUnknowns it checks whether all parts of
> the PHI are known to be ULT
> - if a value is a constant offset away from a PHI: C + X <u B -> X
> <u B-C
> It also looks for predicates among the basic blocks that dominate the
> memory access, trying to
> find a predicate that is sufficient for the validity of the memory
> access.
>
> Best regards,
> --Edwin
>
> <pointertracking.patch>
Here are some comments on the patch.
> namespace llvm {
> char PointerTracking::ID=0;
>
> [...]
Don't indent namespace contents.
> void PointerTracking::getAnalysisUsage(AnalysisUsage &AU) const {
> AU.addRequired<DominatorTree>();
> AU.addRequired<LoopInfo>();
> AU.addRequired<ScalarEvolution>();
These should be addRequiredTransitive instead of addRequired, so that
the
required analyses are preserved throughout the lifetime of the
PointerTracking analysis.
> callocFunc = M.getFunction("calloc");
[...]
> reallocFunc = M.getFunction("realloc");
I don't know what the current prevailing fasion is with respect to
passes knowing special things about functions with standard C library
names. It would be good to mention that this pass knows about
"calloc" and "realloc" in a high-level comment though.
> const SCEV *PointerTracking::getAllocationSize(Value *V)
This name is a little confusing. Can you rename this to
getAllocationElementCount or something?
> {
> if (AllocationInst *AI = dyn_cast<AllocationInst>(V)) {
> Value *arraySize = AI->getArraySize();
> if (ConstantInt *C = dyn_cast<ConstantInt>(arraySize)) {
> return SE->getConstant(C);
> }
> return SE->getSCEVAtScope(arraySize, LI->getLoopFor(AI-
>getParent()));
It isn't actually necessary to check for a ConstantInt specially
here. getSCEVAtScope will effectively do this automatically.
> }
> if (GlobalVariable *GV = dyn_cast<GlobalVariable>(V)) {
> if (GV->hasInitializer()) {
> Constant *C = GV->getInitializer();
> if (const ArrayType *ATy = dyn_cast<ArrayType>(C->getType
())) {
> return SE->getConstant(Type::Int32Ty, ATy->getNumElements
());
Hard-coding Int32Ty here may be problematic in some cases.
> }
> }
> return SE->getConstant(Type::Int32Ty, 1);
> }
> // TODO: implement more complicated pointer size tracking
> return 0;
> }
> const SCEV *PointerTracking::getAllocationSizeInBytes(Value *V)
> {
> assert(TD && "TargetData must be available to calculate size in
bytes!");
Would it make sense to postpone this check until TD is actually
needed, and then return CouldNotCompute if TD is unavailable?
> const SCEV *S = getAllocationSize(V);
> if (isa<SCEVCouldNotCompute>(S)) {
> if (CallInst *CI = dyn_cast<CallInst>(V)) {
> CallSite CS(CI);
> Function *F = dyn_cast<Function>(CS.getCalledValue()-
>stripPointerCasts());
This line exceeds 80 columns.
> const Loop *L = LI->getLoopFor(CI->getParent());
> if (F == callocFunc) {
> return
> SE->getSCEVAtScope(SE->getMulExpr(SE->getSCEV
(CS.getArgument(0)),
> SE->getSCEV
(CS.getArgument(1))),
> L);
> } else if (F == reallocFunc) {
> return SE->getSCEVAtScope(CS.getArgument(1), L);
> }
> }
> return S;
> }
> uint64_t elementsize = TD->getTypeStoreSize(V->getType());
> return SE->getMulExpr(SE->getConstant(S->getType(),
elementsize), S);
The size of the allocation will be the number of elements multiplied
by the "Alloc" size, not the "Store" size.
> const SCEV *getStart(const SCEV *A, const Loop *L)
> {
Here and elsewhere, the brace for a function definition goes on the
same line as the close paren.
> if (isa<SCEVConstant>(A))
> return A;
This would be a little stronger as A->isLoopInvariant(L).
> DomTreeNodeBase<BasicBlock> *N = DT->getNode(BB);
> DomTreeNodeBase<BasicBlock> *D = N->getIDom();
> while (D) { // D dominates N
> BasicBlock *dBB = D->getBlock();
> bool negated;
> Value *V = getConditionToReach(dBB, BB, negated);
> if (conditionSufficient(V, negated, I, ICmpInst::ICMP_ULT,
Limit))
> return AlwaysTrue;
> BB = dBB;
> D = D->getIDom();
> }
This doesn't appear correct. Even if D dominates N and has a
conditional branch with a successor that reaches N, it's possible
that the other successor of the conditional branch reaches N also.
See ScalarEvolution's getPredecessorWithUniqueSuccessorForBB for
code that solves a similar problem.
Dan
More information about the llvm-commits
mailing list