[cfe-dev] [analyzer] Expressing dependencies between arguments and return values

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Tue May 8 17:37:12 PDT 2018



On 5/7/18 11:59 AM, Aleksei Sidorin via cfe-dev wrote:
> Hello Peter,
>
> 07.05.2018 01:07, Peter Wu via cfe-dev пишет:
>> Hi!
>>
>> I just got started with writing a Clang Static Analyzer checker,
>> primarily to detect these classes of issues in C:
>>
>> - allocator/deallocator mismatch: e.g. "g_malloc" must be accompanied by
>>    "g_free" and not "free".
>> - memory leak due to allocator mismatch: e.g. "g_list_new" must be
>>    accompanied by "g_list_free" and not "g_free".
>> - memory leaks and use-after-free issues in general.
>>

The easiest way to accomplish these would be to extend the existing 
functionality of MallocChecker. It already warns about mismatching 
malloc()/operator delete etc. in c++ by tracking allocation families, so 
it most likely has all the functionality you need. It even already has 
support for g_malloc and g_free, but not for many other glib functions.

>> I have an initial version working based on Anna's and Jordan's talk

You might want to look into my workbook as well:
https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf

>> , but
>> am looking at methods to reduce false positives and false negatives.
>> Consider this example:
>>
>>      void checkIdentityFunctionLeak() {
>>        char *p = g_strdup("");
>>        char *q = g_strdelimit(p, "_", '-');
>>      } // expected-warning {{Memory leak}}
>>
>> To avoid false positives with memleak reporting, I added a pointerEscape
>> callback that removes symbols. This however has as side-effect that "p"
>> in the above example is not reported as leaked.

MallocChecker has a relatively sophisticated mechanism to figure out if 
a certain function deserves a pointer escape or not. You might want to 
tweak it. The idea to provide a correct return value is a good call but 
it isn't necessary.

>> "g_strdelimit" can be
>> modelled as an identity function (p==q), but my attempt to express this
>> constraint in the PostCall callback failed:
>>
>>      SValBuilder &svalBuilder = C.getSValBuilder();
>>      ProgramStateRef State = C.getState();
>>      SVal Arg0 = Call.getArgSVal(0);
>>      SVal RetVal = Call.getReturnValue();
>>      DefinedOrUnknownSVal arg0 = Arg0.castAs<DefinedOrUnknownSVal>();
>>      DefinedOrUnknownSVal retVal = 
>> RetVal.castAs<DefinedOrUnknownSVal>();
>>      DefinedOrUnknownSVal PtrEQ = svalBuilder.evalEQ(State, arg0, 
>> retVal);
>>      State = State->assume(PtrEQ, true);
>>      C.addTransition(State);
> The best way to tell the analyzer that some expression returns 
> _exactly same_ value as the argument is to bind the value of the 
> argument as the return result of this expression.
>
>     ProgramStateRef State = C.getState();
>     SVal Arg0Val = Call.getArgSVal(0);
>     State = State->BindExpr(Call.getOriginExpr(), 
> C.getLocationContext(), Arg0Val);
>     C.addTransition(State);
>
> Our primary solver (RangeConstraintManager) has very limited 
> possibilities of handling complex symbolic expressions so it is better 
> to simplify them as much as possible.

I'd like to add to this that you shouldn't do BindExpr in checkPostCall 
or checkPostStmt, only in evalCall. Because otherwise it'd conflict with 
other checkers that may try to model that function.

For modeling string operations you should have a look at CStringChecker 
and extend it.

>> While the debug.ViewExplodedGraph option shows that the symbols are
>> assumed to be the same in the CallExpr node:
>>
>>      conj_$1{gpointer} : { [1, 18446744073709551615] }
>>      (conj_$4{gchar *}) - (conj_$1{gpointer}) : { [0, 0] }
>>
>> this range information is somehow lost in the next nodes.
>>
>> Another example which is currently not caught either (strstr returns an
>> address from the region of "p"):
>>
>>      int main() {
>>          char *p = strdup("xyz");
>>          if (!p) return 1;
>>          char *p2 = strstr(p, "z");
>>          free(p);
>>          puts(p2); // expected-warning {{Use-after-free}}
>>      }
>>
>> Have I misunderstood something in modelling this?Is it possible to
>> express dependencies between the return value and the arguments of a
>> library function?
>
> strstr() lacks modeling in analyzer so it just returns a conjured 
> value. A correct return value should be an ElementRegion whose base 
> region is the region pointed by 'p'. You can try construct this return 
> value using MemRegionManager methods.
>
> Hope this helps.
>
The high-level API for this stuff is SValBuilder::evalBinOp(), just add 
an integer to a pointer. See how CStringChecker handles functions that 
return the end of the string (look out for the returnEnd flag).



More information about the cfe-dev mailing list