[PATCH] D30909: [Analyzer] Finish taint propagation to derived symbols of tainted regions

Vlad Tsyrklevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat May 13 18:01:27 PDT 2017


vlad.tsyrklevich added inline comments.


================
Comment at: lib/StaticAnalyzer/Core/ProgramState.cpp:656-659
+  // If the SVal is a LazyCompoundVal it might only cover sub-region of a given
+  // symbol. For example, the LCV might represent a field in an uninitialized
+  // struct. In this case, the LCV encapsulates a conjured symbol and reference
+  // to the sub-region representing the struct field.
----------------
NoQ wrote:
> vlad.tsyrklevich wrote:
> > NoQ wrote:
> > > I still feel bad about producing API with very tricky pre-conditions. The LCV may have various forms - some may have empty store with no symbols at all, and others may be full of direct bindings that make the symbol essentially irrelevant. However, because the taint API is designed to be defensive to cases when taint cannot be added, and it sounds like a good thing, i guess we've taken the right approach here :)
> > > 
> > > I suggest commenting this more thoroughly though, something like:
> > > 
> > > > If the SVal represents a structure, try to mass-taint all values within the structure. For now it only works efficiently on lazy compound values that were conjured during a conservative evaluation of a function - either as return values of functions that return structures or arrays by value, or as values of structures or arrays passed into the function by reference, directly or through pointer aliasing. Such lazy compound values are characterized by having exactly one binding in their captured store within their parent region, which is a conjured symbol default-bound to the base region of the parent region.
> > > 
> > > Then inside `if (Sym)`:
> > > 
> > > > If the parent region is a base region, we add taint to the whole conjured symbol.
> > > 
> > > > Otherwise, when the value represents a record-typed field within the conjured structure, so we add partial taint for that symbol to that field.
> > The pre-conditions for using the API are actually a bit simpler than what's exposed here. I made it explicit to make the logic for tainting LCVs explicit, but the following simpler logic works:
> > ```
> >   if (auto LCV = V.getAs<nonloc::LazyCompoundVal>()) {
> >     if (Optional<SVal> binding = getStateManager().StoreMgr->getDefaultBinding(*LCV)) {
> >       if (SymbolRef Sym = binding->getAsSymbol()) {
> >         return addPartialTaint(Sym, LCV->getRegion(), Kind);
> >       }
> >     }
> >   }
> > ```
> > 
> > This works because `addPartialTaint()` actually already performs the 'getRegion() == getRegion()->getBaseRegion()` check already and taints the parent symbol if the region is over the base region already. I chose to replicate it here to make the logic more explicit, but now that I've written this down the overhead of duplicating the logic seems unnecessary. Do you agree?
> > The pre-conditions for using the API are actually a bit simpler than what's exposed here.
> 
> I'm talking about the situation when we add the partial taint to the default-bound symbol but it has no effect because there's a direct binding in the lazy compound value on top of it. The user should ideally understand why it doesn't work that way.
> 
> > I chose to replicate it here to make the logic more explicit, but now that I've written this down the overhead of duplicating the logic seems unnecessary. Do you agree?
> 
> The variant without the check looks easier to read, and it is kind of obvious that full taint is a special case of partial taint, so i'm for removing the check.
> when we add the partial taint to the default-bound symbol but it has no effect because there's a direct binding in the lazy compound value on top of it

Ah, so you're talking about the case where the LCV encompasses a value with both direct & default bindings, e.g. `int foo[1024]; foo[123] = rand(); taint(foo);`? In that case we will miss tainting the `rand` SymbolConjured. I suppose we could scan the region store for matching entries? In practice I think we're really only interested in tainting default bindings anyway for some unknown network/user input anyway.

Anyways, your comment makes more sense now. I've added it.


================
Comment at: lib/StaticAnalyzer/Core/ProgramState.cpp:788
+              (R == I->first || R->isSubRegionOf(I->first)))
+            return true;
+        }
----------------
NoQ wrote:
> I actually have no idea why does this function accumulate things in the `Tainted` variable, instead of returning :)
It made a little more stylistic sense before this change, but not so much now. I've updated them to return immediately.


https://reviews.llvm.org/D30909





More information about the cfe-commits mailing list