[PATCH] D30909: [Analyzer] Finish taint propagation to derived symbols of tainted regions

Tue Apr 18 06:55:23 PDT 2017

NoQ added inline comments.

================
Comment at: lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:494
+  SymbolManager &SM = C.getSymbolManager();
+  return SM.getDerivedSymbol(Sym, LCV.getRegion());
 }
----------------
vlad.tsyrklevich wrote:
> NoQ wrote:
> > vlad.tsyrklevich wrote:
> > > NoQ wrote:
> > > > I'd think about this a bit more and come back.
> > > > 
> > > > I need to understand how come that constructing a symbol manually is the right thing to do; that doesn't happen very often, but it seems correct here.
> > > Indeed it is odd. The best justification I could come up with: LCVs are meant to be optimizations, their 'purpose' is to expose an SVal that hides SymbolRef values so that we can have a split store. We don't have to copy all of a compound values SymbolRef mappings because LCVs are kept distinct. Hence to set/query/constrain region values you use SVals so that RegionStore can differentiate between LCVs and SymbolRef backed SVals for the two different stores it contains.
> > > 
> > > The taint interface however requires you taint a SymbolRef, not an SVal. If we wanted, instead of doing this logic here, we could change getPointedToSymbol() to return an SVal and update usages of it accordingly since that value is only passed on to ProgramState.isTainted()/ProgramState.addTaint() anyway. Then we could update addTaint/isTainted to perform this logic, hiding it from the checker.
> > > 
> > > This still requires manually constructing a symbol, now it's just performed in the analyzer instead of in a checker. Not sure if that addresses the issue you were considering, but the idea that we need to 'undo' the LCV optimization hiding the SymbolRef to have a value to taint seems somewhat convincing to me. What do you think?
> > Hmm (!) I suggest adding a new function to the program state, that we'd call `addPartialTaint()` or something like that, and this function would accept a symbol and a region and would act identically to passing a derived symbol (from this symbol and that region) to `addTaint()` (but we wouldn't need to actually construct a derived symbol here).
> > 
> > Such API would be easier to understand and use than the current approach that forces the user to construct a derived symbol manually in the checker code. Unfortunately, this checker's `getLCVSymbol()` would become a bit more complicated (having various return types depending on circumstances), but this misfortune seems more random than systematic to me.
> > 
> > Since we're having this new kind of partial taint, why don't we expose it in the API.
> I'm happy to implement it this way, but figured I'd ask why you prefer this approach first in the interest of keeping the TaintChecker simple! The optimal approach to me seems to be changing `getPointedToSymbol()` to `getPointedToSVal()` and having `addTaint(SVal)` call `addPartialTaint()` when it's passed an LCV sub-region. That way users of the taint interface like the TaintChecker have a clean way to add & check regardless of whether it's a SymbolRef or an LCV but the partial taint functionality is still exposed and documented for those who might want to use it in new ways.
> 
> Just curious to understand your rationale. Thanks for the feedback!
Your idea actually looks good to me! I'd approve going this way.

With this change to `addTaint(SVal)`, i suspect it'd need some extra documentation to explain what it does now.

================
Comment at: lib/StaticAnalyzer/Core/ProgramState.cpp:676
+  if (const SymbolDerived *SD = dyn_cast<SymbolDerived>(Sym)) {
+    TaintedSymRegionsRef SymRegions(0, TSRFactory.getTreeFactory());
+
----------------
The `SymRegions` name is a bit confusing because we often shorten `SymbolicRegion` to `SymRegion` (eg. in dumps), which is not what we mean.

================
Comment at: lib/StaticAnalyzer/Core/ProgramState.cpp:682
+
+    SymRegions = SymRegions.add(SD->getRegion());
+    NewState = NewState->set<DerivedSymTaint>(SD->getParentSymbol(), SymRegions);
----------------
I wonder if it's worth it to check if a super-region of this region is already tainted, and avoid adding the region in this scenario.

I guess in practice it won't happen very often, because this code would most likely be executed just once per taint source. This probably deserves a comment though.

https://reviews.llvm.org/D30909