[cfe-dev] Weak function pointers (was "SymbolRef and SVal confusion")
Richard
tarka.t.otter at googlemail.com
Tue Jan 15 07:21:22 PST 2013
Hi Jordan,
On 15 Jan 2013, at 03:44, Jordan Rose <jordan_rose at apple.com> wrote:
> That's not quite going to work -- what if I explicitly spell out my comparison?
>
>> if (MyWeakFunction == NULL) {
>> }
>
> In this case, SimpleConstraintManager won't go through your new assumeLocSymbol.
>
> ...although actually, it won't make it there at all, because SimpleSValBuilder::evalBinOpLL folds null comparisons against non-symbolic regions. That's actually easy enough to fix, though, and if you do that this might actually work.
I am not sure I follow you here. Why will this not work? On line 646 of SimpleSValBuilder we have:
if (SymbolRef lSym = lhs.getAsLocSymbol())
return MakeSymIntVal(lSym, op, rInt->getValue(), resultTy);
The metadata symbol will be returned here by getAsLocSymbol() and everything proceeds as expected. Testing this on the following trivial code shows a divide by zero warning for both branches of the IfStmt:
int myFunc() __attribute__((weak_import));
int main(int argc, char *argv[])
{
if (myFunc == NULL) {
1 / 0;
} else {
1 / 0;
}
return 0;
}
>
> Oh, and metadata symbols also die unless some checker specifically requests to keep them alive. I'm wondering now if the current behavior of SymbolExtent (immutable, lasts as long as the base region, only as path-specific as their region) is, in fact, closer to the desired behavior here than metadata symbols (invalidatable, path-sensitive, only lasts as long as someone is interested). If this is the case, maybe SymbolExtent should be made more generic.
>
> (Sorry for giving you a runaround here. I'm not sure what the right thing to do is...I just want to avoid "new symbols" being the solution to every problem, and at the same time make sure that the existing symbols have well-defined semantics.)
>
OK, I was under the impression that a SymbolMetadata would stay alive as long as the MemRegion was, but I see that is incorrect in the docs. I would confess to not exactly being an expert on the Symbol class hierarchy, but it seems to me that SymbolExtent is not the right thing to use here. It has the properties we want, but as you mentioned before, it represents the size of a region. This seems an odd symbol to be using to represent a possible NULL pointer to a function. What about using a SymbolRegionValue here?
> It's still a bit of a hack, but it's reaching the point where it's strictly better than what we're doing without introducing new holes in the analyzer logic. Thanks for working on this.
>
> Finally...
> - Don't forget test cases for the weak functions themselves! You need to verify (perhaps using clang_analyzer_eval) that a weak function starts out unknown, but can be constrained to null or non-null.
> - I think you can prune down the number of test cases lifted from other tests...you just need one to show that each function is still being treated specially and not as a generic opaque system function.
Yes, of course, silly omission.
> - The initialization of FunctionTextRegion's WeakSym can happen in the constructor initializers rather than the body.
Ja, I wanted to do this, but it seemed like a bit of a chicken and egg thing. To create the symbol requires knowing the FunctionTextRegion in its constructor, so how does the FunctionTextRegion have the symbol in its constructor, without passing it the SymbolManager so it can create the symbol itself, which also seemed a bit odd. Or am I being stupid?
> - Please be careful about the LLVM coding conventions. You got most of them, but there are still a few issues: comments should be full, capitalized sentences, lines should have no trailing whitespace (even blank lines), and types should have the * or & attached to the name.
>
Will do, thought I had caught everything this time…
> Looking better...
> Jordan
>
>
> On Jan 12, 2013, at 8:43 , Richard <tarka.t.otter at googlemail.com> wrote:
>
>> Hey Jordan,
>>
>> I agree the solution was a bit messy, it seemed like there were too many places that had to know about testing what kind of SymbolicRegion they had. I prefer your idea of attaching a metadata symbol to the FunctionTextRegion for weak functions, this seems a lot cleaner to me. How about the attached diff? I also added some tests for dispatch_once and CFCopy… method.
>>
>> Thoughts?
>>
>> <weak-function-symbols.diff>
>>
>> On 12 Jan 2013, at 03:39, Jordan Rose <jordan_rose at apple.com> wrote:
>>
>>> Hi, Richard. Sorry for the delay in responding.
>>>
>>> Hm. I'm not sure why we need a new symbol type specifically for functions, but I could understand the use for generic weak-linked symbols (unfortunate overloading there). After all, in this case "&foo" may be null:
>>>
>>> extern int foo __attribute__((availability(macosx, 10.4)));
>>>
>>> I'm not sure how we want to represent this in the analyzer, though. We wouldn't want all MemRegions to have associated symbols.
>>>
>>> ...actually, I can think of reasons why we would, at least for all top-level MemRegions. But that's quite a redesign; it's probably not something we're going to do right now.
>>>
>>> In any case, this seems oddly intrusive, with far more places needing to know about SymbolWeakFunction than seems strictly necessary. Having to choose one of two regions available in a single SVal seems very strange, and I'm worried that we won't really be getting this right all the time.
>>>
>>> I'm sorry I'm not able to think through a solid design with you right now; I've got a number of other projects going on. I haven't thought through the ramifications of this yet, but what if you inverted this design and instead gave weak FunctionTextRegions associated metadata symbols with function pointer type? That way it'd be much easier to figure out where FunctionTextRegions need to opt-in to SymbolicRegion-like behavior, and at worst we'd handle these functions the same way we did before. ...Again, I haven't really thought it all the way through, but what do you think?)
>>>
>>> By the way, please make sure your code conforms to the LLVM coding conventions. In particular, we capitalize local variable names, indent with two spaces, attach & and * to the variable name rather than the type name, and keep all lines within 80 columns.
>>>
>>> As for your test cases, I'd personally suggest not using clang_analyzer_eval as a weak function—its only purpose is to be interpreted by the analyzer. malloc() tests checkers evaluating calls. Two other things that might be worth borrowing are dispatch_once, which is implemented with a synthesized body, and some made up CoreFoundation function like CFCopyFoo, which the analyzer handles with a post-call callback to produce a leak warning.
>>>
>>> Jordan
>>>
>>>
>>> On Jan 7, 2013, at 10:22 , Richard <tarka.t.otter at googlemail.com> wrote:
>>>
>>>> Hey Jordan,
>>>>
>>>> How about something like the attached diff? I have included a test case that just stole some of the other tests on C functions and redeclares them as weak and runs the tests again. I did also try making all function pointers weak and running the test cases, which also passes. A couple of test cases needed modifying to return the underlying FunctionTextRegion instead of the SymbolicRegion for weak function pointers, which seems a bit messy, don't know if you have any ideas about a nicer solution to this. Maybe getAsRegion() should always return the FunctionTextRegion if it is wrapping a SymbolWeakFunction?
>>>>
>>>> Richard.
>>>>
>>>> <WeakFuncs.diff>
>>>> <weak-functions.c>
>>>>
>>>> On 5 Jan 2013, at 03:30, Jordan Rose <jordan_rose at apple.com> wrote:
>>>>
>>>>> VisitCast handles the "decay" in the AST from a raw function name to a function pointer; all C function calls are actually calls to function pointers according to the standard. But the actual code that figures out the function to call is in CallEventManager::getSimpleCall, which...huh, doesn't actually look at the callee's SVal if it's known at compile time. Which means only calls through weak function pointers would lose out. I would actually be okay with this since these are (a) rare, and (b) probably not calls we do much special processing for anyway.
>>>>>
>>>>> If you want to try hacking this in, I'd suggest using a conjured symbol with no Expr and no block count (so it's the same all across the program) and the appropriate pointer-to-function type:
>>>>>
>>>>> QualType Ty = Ctx.getPointerType(FD->getType());
>>>>> SVB.conjureSymbol(/*Stmt=*/0, /*LCtx=*/0, Ty, /*VisitCount=*/0, /*Tag=*/FD);
>>>>>
>>>>> And then come up with a bunch of test cases and make sure that if you, say, define "malloc" as weak that we still treat it like "malloc". If everything works, send it back and I'll commit it to SVN.
>>>>>
>>>>> Thanks for working on this!
>>>>> Jordan
>>>>>
>>>>>
>>>>> On Jan 3, 2013, at 13:15 , Richard <tarka.t.otter at googlemail.com> wrote:
>>>>>
>>>>>> Hey Jordan,
>>>>>>
>>>>>> I realise SymbolExtent is the wrong symbol class to use, it was just a quick hack to see how much more work was involved in getting the analyser to assume false on function decls. Not very much it turned out. I guess a new SymExpr subclass is needed.
>>>>>>
>>>>>> The bit I am not clear on is where the analyser calls a function, where I would need to add code to handle this new symbol type. Apologies if this is a stupid question, I had a dig through ExprEngine, but did not find what I was looking for. Is it VisitCast?
>>>>>>
>>>>>> Ta.
>>>>>>
>>>>>> On 3 Jan 2013, at 20:22, Jordan Rose <jordan_rose at apple.com> wrote:
>>>>>>
>>>>>>> SymbolExtent isn't really meant for this; it's supposed to represent the metadata of how large an allocation is in memory. Doing this is basically like changing "return func" to "return sizeof(*func)", except that functions don't really have valid sizes anyway. You really can't put an extent symbol (type size_t) into a loc::MemRegionVal (some kind of pointer-ish thing).
>>>>>>>
>>>>>>> In practice, this lets you do the null test, but won't actually let the analyzer call the function, which is no good.
>>>>>>>
>>>>>>> I don't have any other immediate insights to offer. We just don't have values that can represent either null or a specific function at this time. You might be able to fake it for now by adding a pre-visit check for CastExprs of type CK_FunctionToPointerDecay, and eagerly splitting the path whenever someone references a weak function.
>>>>>>>
>>>>>>> Jordan
>>>>>>>
>>>>>>>
>>>>>>> On Jan 3, 2013, at 10:03 , Richard <tarka.t.otter at googlemail.com> wrote:
>>>>>>>
>>>>>>>> I had a quick attempt at this, by creating a SymbolExtent of a weak function decl code region and creating a SymbolicRegion with that. This actually fixes the checker I was writing, which is nice. I am not sure if I understand fully the implications of doing this however. Where does the SymbolicRegion need to be constrained back to a FunctionTextRegion?
>>>>>>>>
>>>>>>>> Index: Core/SValBuilder.cpp
>>>>>>>> ===================================================================
>>>>>>>> --- Core/SValBuilder.cpp (revision 171384)
>>>>>>>> +++ Core/SValBuilder.cpp (working copy)
>>>>>>>> @@ -190,7 +190,13 @@
>>>>>>>> }
>>>>>>>>
>>>>>>>> DefinedSVal SValBuilder::getFunctionPointer(const FunctionDecl *func) {
>>>>>>>> - return loc::MemRegionVal(MemMgr.getFunctionTextRegion(func));
>>>>>>>> + const FunctionTextRegion *Region = MemMgr.getFunctionTextRegion(func);
>>>>>>>> + if (func->isWeak()) {
>>>>>>>> + const SymbolExtent *Sym = SymMgr.getExtentSymbol(Region);
>>>>>>>> + return loc::MemRegionVal(MemMgr.getSymbolicRegion(Sym));
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + return loc::MemRegionVal(Region);
>>>>>>>> }
>>>>>>>>
>>>>>>>> DefinedSVal SValBuilder::getBlockPointer(const BlockDecl *block,
>>>>>>>>
>>>>>>>> On 20 Dec 2012, at 19:31, Ted Kremenek <kremenek at apple.com> wrote:
>>>>>>>>
>>>>>>>>> On Dec 20, 2012, at 10:14 AM, Jordan Rose <jordan_rose at apple.com> wrote:
>>>>>>>>>
>>>>>>>>>> The problem is that functions are represented by FunctionTextRegions. As you noticed, our design is that only SymbolicRegions can represent NULL—all other regions are known to have an address. However, this is not true for weak symbols (functions or otherwise). In order to get this right, we probably need to enhance the analyzer to treat weak extern symbols like references, and then automatically dereference them upon use.
>>>>>>>>>
>>>>>>>>> I don't think the "references" analogy is quite right. Functions are already modeled in the AST using function pointers, and they are dereferenced during a function call. We could possibly model weak-linked functions using SymbolicRegions, that are then later constrained to alias a specific FunctionTextRegion. Aliasing is something we need to handle better anyway, and I think this would nicely fit into that model.
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130115/572a926d/attachment.html>
More information about the cfe-dev
mailing list