[PATCH] D69813: [analyzer] CERTStrChecker: Model gets()

Csaba Dabis via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Nov 13 11:46:00 PST 2019


Charusso added inline comments.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:201-207
+void CERTStrChecker::evalGets(const CallEvent &Call, const CallContext &CallC,
+                              CheckerContext &C) const {
+  unsigned DestPos = *CallC.DestinationPos;
+  const Expr *DestArg = Call.getArgExpr(DestPos)->IgnoreImpCasts();
+  SVal DestV = Call.getArgSVal(DestPos);
+
+  auto Report = getReport(*BT, Call, CallC, C);
----------------
NoQ wrote:
> NoQ wrote:
> > All right, so basically what you're saying is that literally every invocation of `gets()` deserves a warning. This means that for all practical purposes your checker //is// an AST-based checker, just implemented with path-sensitive callbacks. A path-sensitive checker emits warnings based on multiple events that happen sequentially along the path (use-after-free: "memory deallocated - that same memory used", division by zero: "value constrained to zero - something is being divided by that same value", etc.) but your checker emits the warning by looking at only one statement: "`gets()` is invoked".
> > 
> > Do i understand correctly that your plan is to use path-sensitive analysis for fixits only? But you can't emit fixits for any truly path-sensitive warning anyway. Fixits must work correctly on all execution paths, so you cannot emit a correct fixit by looking at only one execution path. In order to emit fixits, you need to either resort to a pure AST check anyway ("this expression refers to an array of fixed size"), or maybe implement auxiliary data-flow analysis for a certain must-problem (eg., "the buffer argument may have exactly one possible value across all paths that reach `gets()`"). But in both cases the path-sensitive engine does literally nothing to help you; all the data that you'll need for your fixit will be available from the AST.
> Like, i think this was an interesting investigation and i was genuinely curious about how this turns out to be, but for now it seems that the problem you're trying to solve cannot be solved this way. Path sensitive analysis is fundamentally applicable to only 50% of the problems (to "may-problems" but not to "must-problems"), and the problem you're trying to solve is in the latter category. I believe you'll have to fall back to the relatively boring task of adding fixits to `security.insecureAPI.gets`; but then, again, if you manage to employ use-def chains for this problem, that might be quite an inspiring start.
Most of the time the given allocation to hold the arbitrary string happens in a local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`, `strncpy(dst)`, stuff... which pushes new data into that memory block, and then the cool developers write that down: `dst[42] = '\0'` which means all the reports should be thrown away in a path-sensitive manner on `dst`. Reallocations, re-bindings, non-AST stuff could handled very easily with the non-AST checker, like that one.

Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux, ...)` which could not really handled with just a simple AST-based checker. I could not say at the moment we could handle it with symbols, but we have a much larger scope of information by symbols.

Most of the time because of the Analyzer is much smarter than the Tidy we could emit fix-its with the help of flow-sensitiveness very easily. I would create huge white-lists what we want to fix-it, and what we could not, but at some point if we model the symbols better, we can.

Other than that easy false-positive suppression and tiny flow-sensitive rebinding stuff, we could be sure what is going on by each string-manipulation. The `gets()` is a toy example where at most a `grep -rn 'gets('` could do better analysis than us.

The real world looks like that:
```
1	encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1)
 	/ RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH;
2	cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH);
3	memcpy(cryptvector, secret, strlen(secret));
...
4	for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) {
5	  memcpy(cryptvector + strlen(secret), md5trailer, RADIUS_VECTOR_LENGTH);
...
```
- from `postgresql/src/backend/libpq/auth.c`

At `3` we would emit a warning, because the null-termination left by the wrong size of the string, but at `5` we see that, it left, because at that offset the string continues, and dunno, on `6` when we model every flow-sensitive information, the string left non-terminated.

Of course each of that stuff is local and AST-based (with huge overhead of rebindings and impossible false-positive suppression), but when you have two of it, that is when the fun begins.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69813/new/

https://reviews.llvm.org/D69813





More information about the cfe-commits mailing list