[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Daniel Krupp via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat Apr 22 08:34:03 PDT 2023


dkrupp added a comment.

@steakhal your comments are fixed. Thanks for the review.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:108-109
+  if ((stateNotZero && stateZero)) {
+    std::vector<SymbolRef> taintedSyms = getTaintedSymbols(C.getState(), *DV);
+    if (!taintedSyms.empty()) {
+      reportTaintBug("Division by a tainted value, possibly zero", stateZero, C,
----------------
steakhal wrote:
> 
We cannot get rid off the getTaintedSymbols() call, as we need to pass all tainted symbols to reportTaintBug if we want to track back multiple variables. taintedSyms is a parameter of reportTaintBug(...)


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-868
+      std::vector<SymbolRef> TaintedSyms =
+          getTaintedSymbols(State, Call.getReturnValue());
+      if (!TaintedSyms.empty()) {
+        TaintedSymbols.push_back(TaintedSyms[0]);
+        TaintedIndexes.push_back(ArgNum);
+      }
----------------
steakhal wrote:
> 
I think this suggested solution would not be correct here, as ArgSym might not be the actual _tainted_ symbol (inside a more complex expression).

So I would prefer to leave it like this for correctness.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-879
+      std::vector<SymbolRef> TaintedSyms = getTaintedSymbols(State, *V);
+      if (!TaintedSyms.empty()) {
+        TaintedSymbols.push_back(TaintedSyms[0]);
+        TaintedIndexes.push_back(ArgNum);
+      }
----------------
steakhal wrote:
> In these cases, the code would acquire all the tainted subsymbols, which then we throw away and keep only the first one.
> This is why I suggested the approach I did I'm my last review.
I think this suggested solution would not be correct here, as ArgSym might not be the actual _tainted_ symbol (inside a more complex expression).

So I would prefer to leave it like this for correctness.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:294
+      std::vector<SymbolRef> TaintedCasts =
+          getTaintedSymbols(State, SC->getOperand(), Kind);
+      TaintedSymbols.insert(TaintedSymbols.begin(), TaintedCasts.begin(),
----------------
steakhal wrote:
> If `returnFirstOnly` is `true`, this `getTaintedSymbols()` call would still eagerly (needlessly) collect all of the symbols.
> I'd recommend propagating the `returnFirstOnly` parameter to the recursive calls to avoid this problem.
> I also encourage you to make use of the `llvm::append_range()` whenever makes sense.
You are perfectly right. I overlooked these calls and because of the the default parameter did got get a warning.  now fixed.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:147-150
 bool taint::isTainted(ProgramStateRef State, const Stmt *S,
                       const LocationContext *LCtx, TaintTagType Kind) {
-  SVal val = State->getSVal(S, LCtx);
-  return isTainted(State, val, Kind);
+  return !getTaintedSymbols(State, S, LCtx, Kind).empty();
 }
----------------
dkrupp wrote:
> steakhal wrote:
> > TBH I'm not sure if I like that now we allocate unbounded amount of times (because of `getTaintedSymbols()` is recursive and returns by value), where we previously did not.
> > 
> > What we could possibly do is to compute the elements of this sequence lazily.
> > I'm thinking of the `llvm::mapped_iterator`, but I'm not sure if it's possible to have something like that as a return type as it might encode the map function in the type or something like that.
> > Anyway, I'm just saying that it would be nice to not do more than it's necessary, and especially not allocate a lot of short-lived objects there.
> > 
> > Do you think there is a way to have the cake and eat it too?
> > 
> > ---
> > 
> > I did some investigation, and one could get pretty far in the implementation, and maybe even complete it but it would be really complicated as of now. Maybe we could revisit this subject when we have coroutines.
> > 
> > So, I would suggest to have two sets of APIs:
> >  - the usual `isTainted(.) -> bool`
> >  - and a `getTaintedSymbols(.) -> vector<Sym>`
> > The important point would be that the `isTainted()` version would not eagerly collect all tainted sub-syms but return on finding the first one.
> > While, the `getTaintedSymbols()` would collect eagerly all of them, as its name suggests.
> > 
> > Imagine if `getTaintedSymbolsImpl()` had an extra flag like `bool returnAfterFirstMatch`. This way `isTainted()` can call it like that. While in the other case, the parameter would be `false`, and eagerly collect all symbols.
> > 
> > This is probably the best of both worlds, as it prevents `isTainted` from doing extra work and if we need to iterate over the tainted symbols, we always iterate over all of them, so doing it lazily wouldn't gain us much in that case anyway.
> > As a bonus, the user-facing API would be self-descriptive.
> > 
> > WDYT?
> Good idea. I implemented the early return option in getTaintedSymbols(). This is used now by the isTainted() function.
First I wanted to avoid the getTaintedSymbols()->getTaintedSymbolsImpl() proxy calls as it is too bloated IMHO.
But I see your point that it is safer. So I changed it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269



More information about the cfe-commits mailing list