[llvm-commits] [PATCH] Refactoring MemoryBuiltin Analysis
Nick Lewycky
nlewycky at google.com
Wed Jun 20 17:07:54 PDT 2012
Nice! This looks so much better, thanks!
+/// \brief Evaluate the size and offset of an object ponted by a Value*
+/// statically. Fails if size or offset are not known at compile time.
+class ObjectSizeOffsetVisitor
+: public InstVisitor<ObjectSizeOffsetVisitor, SizeOffsetType> {
Mm, two space before the ":" I think? I'm not sure but it looks wrong with
no indentation at all. Same with ObjectSizeOffsetEvaluator too.
+static Function *getCalledFunction(const Value *V, bool
LookThroughBitCast) {
+ if (LookThroughBitCast)
+ V = V->stripPointerCasts();
+ const CallInst *CI = dyn_cast<CallInst>(V);
if (!CI)
- return false;
+ return 0;
That works for C, but operator new can through exceptions. Please support
InvokeInst as well as CallInst. Sorry. Yes, this means that callers have to
worry about asking for a malloc-like function potentially throwing
exceptions.
+static bool hasNoAliasAttr(const Value *V, bool LookThroughBitCast) {
+ Function *Callee = getCalledFunction(V, LookThroughBitCast);
+ return Callee && Callee->hasFnAttr(Attribute::NoAlias);
+}
When V isa CallInst or InvokeInst, the call or invoke may have noalias (and
other attributes normally found on function prototypes) at the call-site
instead. Yes, llvm supports stating that a function may write memory but at
this particular callsite we happen to know that it won't.
CallSite CS(LookThroughBitCast ? V->stripPointerCasts() :
V->stripPointerCasts());
CS.paramHasAttr(~0U, Attribute::NoAlias);
will test that, but you may want to change CallSite/CallInst/InvokeInst to
add CS.hasFnAttr() to make this way cleaner. See the way
CallSite::doesNotReturn() is implemented for example.
+/// \brief Tests if a value is a call to a library function that allocates
+/// unitialized memory (such as malloc).
+bool llvm::isMallocLikeFn(const Value *V, bool LookThroughBitCast) {
Typo in comment ("unitialized").
+/// \brief Compute the size of the object pointed by Ptr. Returns true and
the
+/// object size in Size if successfull, and false otherwise.
Typo in comment ("successfull").
+ APInt ObjSize = Data.first, Offset = Data.second;
+ // check for overflow
+ if (Offset.slt(0) || ObjSize.ult(Offset))
+ Size = 0;
If there's overflow, you don't want to return false?
+STATISTIC(ObjectVisitorInterproc,
+ "ObjectSizeOffsetVisitor: unhandled interproc value");
+STATISTIC(ObjectVisitorLoad,
+ "ObjectSizeOffsetVisitor: unhandled load instruction");
STATISTIC(ObjectVisitorArgument, "Number of arguments with unsolved size
and offset");
STATISTIC(ObjectVisitorLoad, "Number of load instructions with unsolved
size and offset");
+ObjectSizeOffsetVisitor::ObjectSizeOffsetVisitor(const TargetData *TD,
+ LLVMContext &Context,
+ bool RoundToAlign)
+: TD(TD), RoundToAlign(RoundToAlign) {
I tried to figure out how much the : is supposed to be indented by and
found no consensus, except that it's supposed to be indented. :) Can I buy
four spaces, maybe?
+ // check cache
+ CacheMapTy::iterator CacheIt = CacheMap.find(V);
+ if (CacheIt != CacheMap.end())
+ return CacheIt->second;
// ... invalidate CacheIt ...
+ CacheMap[V] = Result;
One common trick is to say:
SizeOffsetEvalType &Result = CacheMap[V];
but that requires a way to tell the difference between a
default-constructed SizeOffsetEvalType from unknown(). If you can't apply
this change, please at least add a comment before "CacheMap[V] = Result;"
to tell the reader that CacheIt is invalid at this point!
So, one other thing I was thinking. We're potentially visiting the same
instruction chain twice. ObjectSizeOffsetVisitor isn't caching, which means
that calling the evaluator on a chain of instructions like:
void foo(A) {
B = A + 1
C = B + 1
D = C + 1
// ...
Z = Y + 1
if we call the evaluator on Z we'll end up calling the visitor on Y which
will traverse up to A then fail, then the evaluator will walk to Y and call
the visitor on X which will do the same traversal (minus one) and fail.
Ultimately this is O(n^2).
One fix is to add caching, but can we do better?
I think you can submit the next version of the patch instead of mailing it
out for further review.
Nick
On 19 June 2012 16:20, Nuno Lopes <nunoplopes at sapo.pt> wrote:
> Thanks Nick and Chandler for the reviews! Since this patch touches a lot
> of different places, I definitely want to make sure I don't break (too
> many) things.
> I send a new patch in attach, which I believe addresses all concerns
> raised so far.
>
>
>
> I think this should really be two pieces, one which determines the size of
>> things which can be done without modifying the code and a wrapper around
>> that which builds up IR when that fails. Here's roughly what I have in
>> mind:
>>
>
> Well, I gave it a try and I split the code in the new patch.
> The only thing that bothers me is that there is a little code duplication.
> But I agree the code becomes much cleaner.
>
>
>
> Index: lib/Analysis/IPA/**GlobalsModRef.cpp
>> ==============================**==============================**=======
>> --- lib/Analysis/IPA/**GlobalsModRef.cpp (revision 158666)
>> +++ lib/Analysis/IPA/**GlobalsModRef.cpp (working copy)
>> @@ -329,15 +329,8 @@
>> // Check the value being stored.
>> Value *Ptr = GetUnderlyingObject(SI->**getOperand(0));
>>
>> - if (isMalloc(Ptr)) {
>> - // Okay, easy case.
>> - } else if (CallInst *CI = dyn_cast<CallInst>(Ptr)) {
>> - Function *F = CI->getCalledFunction();
>> - if (!F || !F->isDeclaration()) return false; // Too hard to
>> analyze.
>> - if (F->getName() != "calloc") return false; // Not calloc.
>> - } else {
>> + if (!isNoAliasFn(Ptr))
>> return false; // Too hard to analyze.
>> - }
>>
>> I'm not sure this is right.. There's another assumption that globals
>> mod-ref is making which isn't in the comments; it's assuming that the
>> called functions (malloc and calloc) have no side-effects other than the
>> allocation of memory. In general, the callee needs to mark
>> "FunctionInfo[Writers[i]].**GlobalInfo[I] |= Mod" when isNoAlias but can
>> remove that marking when it really is malloc/calloc.
>>
>
> You're right. I replaced that call with isAllocLikeFn().
>
>
>
> +enum AllocType {
>> + MallocLike = 1<<0, // allocates
>> + CallocLike = 1<<1, // allocates + bzero
>> + ReallocLike = 1<<2, // reallocates
>> + StrDupLike = 1<<3,
>> + AllocLike = MallocLike | CallocLike | StrDupLike,
>> + AnyAlloc = MallocLike | CallocLike | ReallocLike | StrDupLike
>> +};
>>
>> I don't understand why StrDup gets special handling? "fopen" also returns
>> newly allocated memory. What they have in common is the noalias marker.
>> Are
>> you using this like calloc to indicate that the memory is initialized to a
>> copy of other memory? Please add comments.
>>
>
> strdup() has to be treated separately because the function to compute the
> object's size (strlen(str)+1) has a different pattern from the others
> (which is a simple multiplication of up to 2 parameters).
> Supporting fopen() here is not very important IMHO, since client code will
> not dereference a FILE* directly, so the information about its size is
> probably not very important. For alias analysis purposes, the function
> should be marked noalias.
>
>
> Thanks,
> Nuno
>
>
>
>
>>
>> Nick
>>
>> On 18 June 2012 14:19, Nuno Lopes <nunoplopes at sapo.pt> wrote:
>>
>> Hi,
>>>
>>> Right now, there are multiple implementations of analysis of memory
>>> allocation functions across the LLVM codebase, including:
>>> - computing size of pointed objects (dead store elimination,
>>> instcombine,
>>> basic alias analysis)
>>> - recognize allocation functions (e.g., to know that the return value
>>> doesn't alias anything else, or to recognize uninitialized allocated
>>> memory
>>> for undef purposes).
>>>
>>> Therefore I send in attach a patch to refactor the MemoryBuiltin Analysis
>>> to include all the functionality needed by the other passes just
>>> mentioned.
>>> It removes a lot of code duplication, and the proposed implementation is
>>> better than all the previous ones combined. The patch adds support for
>>> the
>>> noalias function attribute as well.
>>> This patch also adds support for computing the size of pointed objects at
>>> run-time, which is basically a copy from the bounds checking pass. After
>>> this patch goes in, most of the code in the bounds checking pass can be
>>> removed.
>>>
>>> Any comments/suggestions/etc..?
>>>
>>> Thanks,
>>> Nuno
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120620/c8b99f79/attachment.html>
More information about the llvm-commits
mailing list