[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Apr 5 09:04:42 PDT 2023


steakhal added a comment.

Looks even better. Only minor concerns remained, mostly about style and suggestions of llvm utilities.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:129-130
 /// Given a pointer/reference argument, return the value it refers to.
-std::optional<SVal> getPointeeOf(const CheckerContext &C, SVal Arg) {
+std::optional<SVal> getPointeeOf(ASTContext &ASTCtx,
+                                 const ProgramStateRef State, SVal Arg) {
   if (auto LValue = Arg.getAs<Loc>())
----------------
BTW I don't know but `State->getStateManager().getContext()` can give you an `ASTContext`. And we tend to not put `const` to variable declarations. See [[ https://releases.llvm.org/4.0.0/tools/clang/tools/extra/docs/clang-tidy/checks/readability-avoid-const-params-in-decls.html | readability-avoid-const-params-in-decls ]]

In other places we tend to refer to `ASTContext` by the `ACtx` I think.
We also prefer const refs over mutable refs. Is the mutable ref justified for this case?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:159
+/// when the return value, or the outgoing parameters are tainted.
+const NoteTag *taintOriginTrackerTag(CheckerContext &C,
+                                     std::vector<SymbolRef> TaintedSymbols,
----------------
My bad. In LLVM style we use `UpperCamelCase` for variable names.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:173-175
+    if (TaintedSymbols.empty()) {
+      return "Taint originated here";
+    }
----------------
Generally, in LLVM style we don't put braces to single block statements unless it would hurt readability, which I don't think applies here.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:176-180
+    for (auto Sym : llvm::enumerate(TaintedSymbols)) {
+      LLVM_DEBUG(llvm::dbgs() << "Taint Propagated from argument "
+                              << TaintedArgs.at(Sym.index()) + 1 << "\n");
+      BR.markInteresting(Sym.value());
+    }
----------------
I was also bad with this recommendation.
I think we can now use structured bindings to get the index and value right there, like:
`for (auto [Idx, Sym] : llvm::enumerate(TaintedSymbols))`
[[ https://github.com/llvm/llvm-project/blob/main/llvm/docs/ProgrammersManual.rst#enumerate | See ]]


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:203
+    int nofTaintedArgs = 0;
+    for (auto Sym : llvm::enumerate(TaintedSymbols)) {
+      if (BR.isInteresting(Sym.value())) {
----------------
Same here about structured bindings.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:209-214
+          if (nofTaintedArgs == 0)
+            Out << "Taint propagated to argument "
+                << TaintedArgs.at(Sym.index()) + 1;
+          else
+            Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+          nofTaintedArgs++;
----------------
I'd recommend using `llvm::interleaveComma()` in such cases.
You can probably get rid of `nofTaintedArgs` as well - by using this function.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:211
+            Out << "Taint propagated to argument "
+                << TaintedArgs.at(Sym.index()) + 1;
+          else
----------------
For clang diagnostics we usually use ordinary suffixes like `{st,nd,rd,th}`. It would be nice to align with the rest of the clang diagnostics on this.
It would require a bit of work on the wording though, I admit.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:213
+          else
+            Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+          nofTaintedArgs++;
----------------
I believe this branch is uncovered by tests.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:221
+    }
+    return std::string(Out.str());
+  });
----------------
I think since you explicitly specify the return type of the lambda, you could omit the spelling of `std::string` here.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-864
       State = addTaint(State, Call.getReturnValue());
+      SymbolRef TaintedSym = isTainted(State, Call.getReturnValue());
+      if (TaintedSym) {
+        TaintedSymbols.push_back(TaintedSym);
----------------
We tend to fuse such declarations:
I've seen other cases like this elsewhere. Please check.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-878
+      // FIXME: The call argument may be a complex expression
+      // referring to multiple tainted variables.
+      // Now we generate notes and track back only one of them.
+      SymbolRef TaintedSym = isTainted(State, *V);
----------------
You could iterate over the symbol dependencies of the SymExpr (of the `*V` SVal).

```lang=c++
SymbolRef PointeeAsSym = V->getAsSymbol();
// eee, can it be null? Sure it can. See isTainted(Region),... for those cases we would need to descend and check their symbol dependencies.
for (SymbolRef SubSym : llvm::make_range(PointeeAsSym->symbol_begin(), PointeeAsSym->symbol_end())) {
  // TODO: check each if it's also tainted, and update the `TaintedSymbols` accordingly, IDK.
}
```
Something like this should work for most cases (except when `*V` refers to a tainted region instead of a symbol), I think.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:162
+                                  const CallEvent& Call) {
+  const LocationContext* LC = Call.getCalleeStackFrame(0);
+
----------------
dkrupp wrote:
> steakhal wrote:
> > If the `Call` parameter is only used for acquiring the `LocationContext`, wouldn't it be more descriptive to directly pass the `LocationContext` to the function instead?
> > I'm also puzzled that we use `getCalleeStackFrame` here. I rarely ever see this function, so I'm a bit worried if this pick was intentional. That we pass the `0` as the `BlockCount` argument only reinforces this instinct.
> The call.getCalleeStackFrame(0) gets the location context of the actual call that we are analyzing (in the pre or postcall), and that's what we need to mark interesting. It is intentionally used like this. I changed the parameter to locationcontext as use suggested.
Okay.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:204
       // If this is a SymbolDerived with a tainted parent, it's also tainted.
-      if (isTainted(State, SD->getParentSymbol(), Kind))
-        return true;
+      if (SymbolRef TSR = isTainted(State, SD->getParentSymbol(), Kind))
+        return TSR;
----------------
Here we still have the `TSR` token.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp:220-228
+    if (Kind == VLA_Tainted)
+      BT.reset(new BugType(this,
+                           "Dangerous variable-length array (VLA) declaration",
+                           categories::TaintedData));
+    else
+      BT.reset(new BugType(this,
+                           "Dangerous variable-length array (VLA) declaration",
----------------
dkrupp wrote:
> steakhal wrote:
> > Why don't we use a distinct BugType for this?
> You mean a new bug type instances? Would there be an advantage for that? Seemed to be simpler this way. To distinguish identify the tainted reports with the bug category.
> You mean a new bug type instances? Would there be an advantage for that? Seemed to be simpler this way. To distinguish identify the tainted reports with the bug category.

I never checked how `BugTypes` constitute to bugreport construction, but my gut instinct suggests that we should have two separate instances like we frequently do for other checkers.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269



More information about the cfe-commits mailing list