[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Fri Apr 21 06:59:20 PDT 2023

steakhal requested changes to this revision.
steakhal added a comment.
This revision now requires changes to proceed.

I didn't go through the whole revision this time, but I think the next steps are already clear for the next round.
My impression was that I might not expressed my intent and expectations about the directions of the next step.
I hope I managed this time. Let me know if you have questions.

================
Comment at: clang/include/clang/StaticAnalyzer/Checkers/Taint.h:82-103
+/// Returns the tainted Symbols for a given Statement and state.
+std::vector<SymbolRef> getTaintedSymbols(ProgramStateRef State, const Stmt *S,
+                                         const LocationContext *LCtx,
+                                         TaintTagType Kind = TaintTagGeneric,
+                                         bool returnFirstOnly = false);
+
+/// Returns the tainted Symbols for a given SVal and state.
----------------
The overloads having the extra `ReturnFirstOnly` parameter shouldn't be visible here in the header.
That is an implementation detail that no users should know about.
Note that having a single default argument overload potentially doubles the variations the user might need to keep in mind when choosing the right one. So there is value in simplicity.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp:248-254
+  std::vector<SymbolRef> TaintedSyms =
+      getTaintedSymbols(errorState, TaintedSVal);
+  // Mark all tainted symbols interesting
+  // to track back the propagation of taintedness.
+  for (auto Sym : TaintedSyms) {
+    BR->markInteresting(Sym);
+  }
----------------

================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:62
+                                    CheckerContext &C,
+                                    std::vector<SymbolRef> TaintedSyms) const {
+  if (ExplodedNode *N = C.generateErrorNode(StateZero)) {
----------------

================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:108-109
+  if ((stateNotZero && stateZero)) {
+    std::vector<SymbolRef> taintedSyms = getTaintedSymbols(C.getState(), *DV);
+    if (!taintedSyms.empty()) {
+      reportTaintBug("Division by a tainted value, possibly zero", stateZero, C,
----------------

================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-868
+      std::vector<SymbolRef> TaintedSyms =
+          getTaintedSymbols(State, Call.getReturnValue());
+      if (!TaintedSyms.empty()) {
+        TaintedSymbols.push_back(TaintedSyms[0]);
+        TaintedIndexes.push_back(ArgNum);
+      }
----------------

================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-879
+      std::vector<SymbolRef> TaintedSyms = getTaintedSymbols(State, *V);
+      if (!TaintedSyms.empty()) {
+        TaintedSymbols.push_back(TaintedSyms[0]);
+        TaintedIndexes.push_back(ArgNum);
+      }
----------------
In these cases, the code would acquire all the tainted subsymbols, which then we throw away and keep only the first one.
This is why I suggested the approach I did I'm my last review.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:936-940
+    IsMatching =
+        IsMatching || (PropSrcArgs.contains(I) &&
+                       isTaintedOrPointsToTainted(State, C.getSVal(E)));
+    std::optional<SVal> TaintedSVal =
+        getTaintedPointeeOrPointer(State, C.getSVal(E));
----------------
Here `getTaintedPointeeOrPointer` would be called two times, unnecessarily.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:947-948
+      if (!TaintedArgSyms.empty()) {
+        TaintedSymbols.insert(TaintedSymbols.begin(), TaintedArgSyms.begin(),
+                              TaintedArgSyms.end());
+        TaintedIndexes.push_back(I);
----------------

================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:1009-1010
   assert(E);
-  std::optional<SVal> TaintedSVal{getTaintedPointeeOrPointer(C, C.getSVal(E))};
+  std::optional<SVal> TaintedSVal{
+      getTaintedPointeeOrPointer(C.getState(), C.getSVal(E))};

----------------

================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:1019-1023
+    std::vector<SymbolRef> TaintedSyms =
+        getTaintedSymbols(C.getState(), *TaintedSVal);
+    for (auto TaintedSym : TaintedSyms) {
+      report->markInteresting(TaintedSym);
+    }
----------------

================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:149
                       const LocationContext *LCtx, TaintTagType Kind) {
-  SVal val = State->getSVal(S, LCtx);
-  return isTainted(State, val, Kind);
+  return !getTaintedSymbols(State, S, LCtx, Kind, true).empty();
 }
----------------
We usually pass booleans by "name".

================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:294
+      std::vector<SymbolRef> TaintedCasts =
+          getTaintedSymbols(State, SC->getOperand(), Kind);
+      TaintedSymbols.insert(TaintedSymbols.begin(), TaintedCasts.begin(),
----------------
If `returnFirstOnly` is `true`, this `getTaintedSymbols()` call would still eagerly (needlessly) collect all of the symbols.
I'd recommend propagating the `returnFirstOnly` parameter to the recursive calls to avoid this problem.
I also encourage you to make use of the `llvm::append_range()` whenever makes sense.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp:230-232
+  std::vector<SymbolRef> TaintedSyms = getTaintedSymbols(State, TaintedSVal);
+  for (auto Sym : TaintedSyms)
+    report->markInteresting(Sym);
----------------
Ah, how awesome it would be to have a `markInteresting(llvm::ArrayRef<SymbolRef>)` overload.

================
Comment at: clang/test/Analysis/taint-diagnostic-visitor.c:53-67
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+                                 // expected-note at -1 {{Taint propagated to the return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
----------------
steakhal wrote:
> I know this is subjective, but I'd suggest to reformat the tests to match LLVM style guidelines, unless the formatting is important for the test.
> Consistency helps the reader and reviewer, as code and tests are read many more times than written.
> 
> This applies to the rest of the touched tests.
Originally I meant this to the rest of the test cases you change or add part of this patch. I hope it clarifies.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269