[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Apr 3 07:08:33 PDT 2023


steakhal added a comment.

In D144269#4237539 <https://reviews.llvm.org/D144269#4237539>, @dkrupp wrote:

> This is a totally rewritten version of the patch which solely relies on the existing "interestingness" utility to track back the taint propagation.  (And  does not introduce a new FlowID in the ProgramState as requested in the reviews.)
>
> -The new version also places a Note, when the taintedness is propagated to an argument or to a return value. So it should be easier for the user to follow how the taint information is spreading. 
> -"The taint originated here" is printed correctly at the taint source function, which introduces taintedness. (Main goal of this patch.)
>
> Implementation:
> -The createTaintPreTag() function places a NoteTag at the taint propagation function calls, if taintedness is propagated. Then at report creation, the tainted arguments are marked interesting if propagated taintedness is relevant for the bug report.
>
> - The isTainted() function is extended to return the actually tainted SymbolRef. This is important to be able to consistently mark relevant symbol interesting which carries the taintedness in a complex expression.

So this is how you circumvent introducing "transitive interestingness", because now you know which symbol to track.

> -createTaintPostTag(..) function places a NoteTag to the taint generating function calls to mark them interesting if they are relevant for a taintedness report. So if they propagated taintedness to interesting symbol(s).
>
> The tests are passing and the reports on the open source projects are much better understandable than before (twin, tmux, curl):
>
> https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=curl_curl-7_66_0_dkrupp_taint_origin_fix_new&run=tmux_2.6_dkrupp_taint_origin_fix_new&run=twin_v0.8.1_dkrupp_taint_origin_fix_new&is-unique=on&diff-type=New&checker-msg=%2auntrusted%2a&checker-msg=Out%20of%20bound%20memory%20access%20%28index%20is%20tainted%29

I've looked at the results you attached and they look good to me.

Do you have an example where two tainted values are contributing to the same bug? Something like:

  n <- read();
  m <- read();
  malloc(n+m)

Will both of these values tracked back? What do the notes look like?

---

All in all, I'm pleased to see the improvements. It looks much better now IMO.
Using two NoteTags cleans up the implementation quite a bit, kudos.
I don't think there are major problems with this implementation, so I decided to spew your code with my nitpicks :D

Remember to clang-format your code. See `clang/tools/clang-format/clang-format-diff.py`.
And there are a few overloads of `getNoteTag()`, and there could be a better fit for usecases; you should decide.

I also find it difficult to track how a variable got tainted across assignments and computations like in this <https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=curl_curl-7_66_0_dkrupp_taint_origin_fix_new&run=tmux_2.6_dkrupp_taint_origin_fix_new&run=twin_v0.8.1_dkrupp_taint_origin_fix_new&is-unique=on&diff-type=New&checker-msg=%2auntrusted%2a&checker-msg=Out%20of%20bound%20memory%20access%20%28index%20is%20tainted%29&report-id=1549451&report-hash=c509c5c3af850f823db37c471550c248&report-filepath=%2afake_ntlm.c> case.
This observation is completely orthogonal to your patch, I'm just noting it. it was bad previously as well.
Maybe we could have a visitor for explaining the taint tracking across assignments and computations to complement the NoteTag.

I hope I didn't miss much this time :D



================
Comment at: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp:238-244
+    if (kind == OOB_Tainted)
+      BT.reset(
+          new BugType(this, "Out-of-bound access", categories::TaintedData));
+    else
+      BT.reset(
+          new BugType(this, "Out-of-bound access", categories::LogicError));
+  }
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:45
 
-void DivZeroChecker::reportBug(
-    const char *Msg, ProgramStateRef StateZero, CheckerContext &C,
-    std::unique_ptr<BugReporterVisitor> Visitor) const {
+void DivZeroChecker::reportBug(const char *Msg, std::string Category,
+                               ProgramStateRef StateZero, CheckerContext &C,
----------------
It feels odd to have both `const char*` and a `std::string` on the same line.
Should we update `const char*` to a more sophisticated type?

I'm thinking of `StringRef`. It seems like that type should be used for the `Category` as well since the `PathSensitiveBugReport` constructor takes that, so we don't need to have an owning type here.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:91-96
+  SymbolRef TSR = isTainted(C.getState(), *DV);
+  if ((stateNotZero && stateZero && TSR)) {
+    reportBug("Division by a tainted value, possibly zero",
+              categories::TaintedData, stateZero, C, TSR);
     return;
   }
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:138-139
 /// Also considers stdin as a taint source.
 std::optional<SVal> getTaintedPointeeOrPointer(const CheckerContext &C,
+                                               const ProgramStateRef State,
                                                SVal Arg) {
----------------
I'm not sure about passing both the `CheckerContext` and the `State`.
The `CheckerContext` already encapsulates a `State`, which opens up the possibilities for misuse.

For example, the `getPointeeOf()` is called in this function, and that will eventually call `C.getState()` under the hood. So, for me it feels like a bad API design.
What we could do instead, is to pass an `ASTContext` and a `State`; resolving this discrepancy.

Could you please check if this is a real concern?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:158
+/// when the return value, or the outgoing parameters are tainted.
+const NoteTag *createTaintPreTag(CheckerContext &C,
+                                  std::vector<SymbolRef> TaintedSymbols,
----------------
This function should be an implementation detail, as such I wonder if we should make it `static`.

How about naming this function differently?
I'm thinking of `taintOriginTrackerTag`, IDK.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:162
+                                  const CallEvent& Call) {
+  const LocationContext* LC = Call.getCalleeStackFrame(0);
+
----------------
If the `Call` parameter is only used for acquiring the `LocationContext`, wouldn't it be more descriptive to directly pass the `LocationContext` to the function instead?
I'm also puzzled that we use `getCalleeStackFrame` here. I rarely ever see this function, so I'm a bit worried if this pick was intentional. That we pass the `0` as the `BlockCount` argument only reinforces this instinct.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:165
+  const NoteTag *InjectionTag = C.getNoteTag(
+      [TaintedSymbols, TaintedArgs,LC](PathSensitiveBugReport &BR) -> std::string {
+        SmallString<256> Msg;
----------------
It should consume the `TaintedSymbols` and `TaintedArgs` variables, as such you should `std::move` out from the original parameters like this:
```
[TaintedSymbols = std::move(TaintedSymbols), TaintedArgs = std::move(TaintedArgs)](){...}
```


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:169
+        // We give diagnostics only for taint related reports
+        if (!BR.isInteresting(LC) ||
+            BR.getBugType().getCategory() != categories::TaintedData) {
----------------
What does the first half of this condition guard against?
Do you have a test for it to demonstrate?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:173-176
+        if (TaintedSymbols.empty()){
+          Out << "Taint originated here";
+          return std::string(Out.str());
+        }
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:177-183
+        int i = 0;
+        for (SymbolRef SR : TaintedSymbols) {
+          LLVM_DEBUG(llvm::dbgs() << "Taint Propagated from argument"
+                                    << TaintedArgs.at(i) << "\n");
+          BR.markInteresting(SR);
+          i++;
+          }
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:192
+/// when the return value, or the outgoing parameters are tainted.
+const NoteTag *createTaintPostTag(CheckerContext &C,
+                                  std::vector<SymbolRef> TaintedSymbols,
----------------
How about calling this `taintPropagationExplainerTag`?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:197
+  const LocationContext* LC = Call.getCalleeStackFrame(0);
+
+  const NoteTag *InjectionTag = C.getNoteTag(
----------------
Please assert that the size of `TaintedSymbols` must be the same as `TaintedArgs`. Same in the other function.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:198
+
+  const NoteTag *InjectionTag = C.getNoteTag(
+      [TaintedSymbols, TaintedArgs, LC](PathSensitiveBugReport &BR) -> std::string {
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:214
+                                      << TaintedArgs.at(i) << "\n");
+              Out << "Taint propagated to argument " << TaintedArgs.at(i);
+            } else {
----------------
So, if `TaintedSymbols.size() > 1`, then the note message will look weird.
Could you please have a test for this?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:871
+      }else{
+         LLVM_DEBUG(llvm::dbgs() << "Strange. Return value not tainted.\n");
+         LLVM_DEBUG(Call.getReturnValue().dumpToStream(llvm::dbgs()));
----------------
I cannot see a test against the "Strange" string. Is this dead code?
Same for the other block.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:919-922
+    //Add taintedness to stdin parameters
+    if (isStdin(C.getSVal(E),C.getASTContext())){
+      State = addTaint(State, C.getSVal(E));
+    }
----------------
I just want to get it confirmed that this hunk is unrelated to your change per-say. Is it?
BTW I don't mind this change being part of this patch, rather the opposite. Finally, we will have it.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:942
   bool IsMatching = PropSrcArgs.isEmpty();
-  ForEachCallArg(
-      [this, &C, &IsMatching, &State](ArgIdxTy I, const Expr *E, SVal) {
-        IsMatching = IsMatching || (PropSrcArgs.contains(I) &&
-                                    isTaintedOrPointsToTainted(E, State, C));
-      });
+  const NoteTag *InjectionTag = nullptr;
+
----------------
I prefer to move declarations close to their uses.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:1003
     State = State->set<TaintArgsOnPostVisit>(C.getStackFrame(), Result);
-  C.addTransition(State);
+  InjectionTag = createTaintPreTag(C,TaintedSymbols,TaintedIndexes,Call);
+  C.addTransition(State, InjectionTag);
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:1025-1028
+    SymbolRef TSR = isTainted(C.getState(),*TaintedSVal);
+    if (TSR)
+      report->markInteresting(TSR);
+
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:169-173
+    SymbolRef TSR;
+    if (TSR = isTainted(State, ER->getSuperRegion(), K))
+      return TSR;
+    else if (TSR = isTainted(State, ER->getIndex(), K))
+      return TSR;
----------------
I'd probably swap the two branches, so that the index would be tracked if both the region and the index are tainted.
I also wonder if this edge-case could be tested at all.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:228-230
+      if (SymbolRef TSR =
+              isTainted(State, SRV->getRegion(), Kind))
+        return TSR;
----------------
What does `TSR` abbreviate? I would find `TaintedSym` more descriptive.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp:220-228
+    if (Kind == VLA_Tainted)
+      BT.reset(new BugType(this,
+                           "Dangerous variable-length array (VLA) declaration",
+                           categories::TaintedData));
+    else
+      BT.reset(new BugType(this,
+                           "Dangerous variable-length array (VLA) declaration",
----------------
Why don't we use a distinct BugType for this?


================
Comment at: clang/test/Analysis/taint-diagnostic-visitor.c:46
                    // expected-note at -1 {{Taint originated here}}
+                   // expected-note at -2 {{Taint propagated to argument 1}}
   int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
----------------
I'd suggest to count from 1 instead of 0 when referring to Nth arguments.
You can also use the `llvm::getOrdinalSuffix(N)` to get nicer messages.
We already count from 1 in the std library checker.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269



More information about the cfe-commits mailing list