[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Apr 19 02:42:55 PDT 2023


steakhal requested changes to this revision.
steakhal added a comment.
This revision now requires changes to proceed.

Nice improvement!
I only have minor nitpicks and some recommendations for the taint API.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:205
+        BR.markInteresting(CallLocation);
+        if (TaintedArgs.at(Idx) != ReturnValueIndex) {
+          LLVM_DEBUG(llvm::dbgs() << "Taint Propagated to argument "
----------------
We usually use the `operator[]` on vector instead of the `at()`. When in doubt, we assert that the index is in `idx < size()`.
Apply this for all `.at()` uses.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:209-214
+          if (nofTaintedArgs == 0)
+            Out << "Taint propagated to argument "
+                << TaintedArgs.at(Sym.index()) + 1;
+          else
+            Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+          nofTaintedArgs++;
----------------
dkrupp wrote:
> steakhal wrote:
> > I'd recommend using `llvm::interleaveComma()` in such cases.
> > You can probably get rid of `nofTaintedArgs` as well - by using this function.
> I chose another solution. I hope that is ok too.
Looks good. I'm not expecting many cases propagating to multiple arguments anyway.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:147-150
 bool taint::isTainted(ProgramStateRef State, const Stmt *S,
                       const LocationContext *LCtx, TaintTagType Kind) {
-  SVal val = State->getSVal(S, LCtx);
-  return isTainted(State, val, Kind);
+  return !getTaintedSymbols(State, S, LCtx, Kind).empty();
 }
----------------
TBH I'm not sure if I like that now we allocate unbounded amount of times (because of `getTaintedSymbols()` is recursive and returns by value), where we previously did not.

What we could possibly do is to compute the elements of this sequence lazily.
I'm thinking of the `llvm::mapped_iterator`, but I'm not sure if it's possible to have something like that as a return type as it might encode the map function in the type or something like that.
Anyway, I'm just saying that it would be nice to not do more than it's necessary, and especially not allocate a lot of short-lived objects there.

Do you think there is a way to have the cake and eat it too?

---

I did some investigation, and one could get pretty far in the implementation, and maybe even complete it but it would be really complicated as of now. Maybe we could revisit this subject when we have coroutines.

So, I would suggest to have two sets of APIs:
 - the usual `isTainted(.) -> bool`
 - and a `getTaintedSymbols(.) -> vector<Sym>`
The important point would be that the `isTainted()` version would not eagerly collect all tainted sub-syms but return on finding the first one.
While, the `getTaintedSymbols()` would collect eagerly all of them, as its name suggests.

Imagine if `getTaintedSymbolsImpl()` had an extra flag like `bool returnAfterFirstMatch`. This way `isTainted()` can call it like that. While in the other case, the parameter would be `false`, and eagerly collect all symbols.

This is probably the best of both worlds, as it prevents `isTainted` from doing extra work and if we need to iterate over the tainted symbols, we always iterate over all of them, so doing it lazily wouldn't gain us much in that case anyway.
As a bonus, the user-facing API would be self-descriptive.

WDYT?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:265-268
+      std::vector<SymbolRef> TaintedRegions =
+          getTaintedSymbols(State, SRV->getRegion(), Kind);
+      TaintedSymbols.insert(TaintedSymbols.begin(), TaintedRegions.begin(),
+                            TaintedRegions.end());
----------------
For such constructs, I would prefer this.


================
Comment at: clang/test/Analysis/taint-diagnostic-visitor.c:12-13
+char* getenv( const char* env_var );
+size_t strlen( const char* str );
+void *malloc(size_t size );
+void free( void *ptr );
----------------
The [[ https://buildkite.com/llvm-project/premerge-checks/builds/146760#01877fd9-2f3a-4f8d-9ffa-6e017eb883c5 | premerge bots ]] are complaining about these two lines on Windows:
```
error: 'warning' diagnostics seen but not expected: 
  File C:\ws\w8\llvm-project\premerge-checks\clang\test\Analysis\taint-diagnostic-visitor.c Line 12: incompatible redeclaration of library function 'strlen'
  File C:\ws\w8\llvm-project\premerge-checks\clang\test\Analysis\taint-diagnostic-visitor.c Line 13: incompatible redeclaration of library function 'malloc'
error: 'note' diagnostics seen but not expected: 
  File C:\ws\w8\llvm-project\premerge-checks\clang\test\Analysis\taint-diagnostic-visitor.c Line 12: 'strlen' is a builtin with type 'unsigned long long (const char *)'
  File C:\ws\w8\llvm-project\premerge-checks\clang\test\Analysis\taint-diagnostic-visitor.c Line 13: 'malloc' is a builtin with type 'void *(unsigned long long)'
4 errors generated.
```

I think it's because `size_t` should be defined as `unsigned long long` on `x86_64`. This also means that you should pin the target to `x86_64` to satisfy this test on all platforms.


================
Comment at: clang/test/Analysis/taint-diagnostic-visitor.c:53-67
+//Tests if the originated note is correctly placed even if the path is
+//propagating through variables and expressions
+char* taintDiagnosticPropagation(){
+  char *pathbuf;
+  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+                                 // expected-note at -1 {{Taint propagated to the return value}}
+  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
----------------
I know this is subjective, but I'd suggest to reformat the tests to match LLVM style guidelines, unless the formatting is important for the test.
Consistency helps the reader and reviewer, as code and tests are read many more times than written.

This applies to the rest of the touched tests.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269



More information about the cfe-commits mailing list