[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place
Balázs Benics via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Apr 3 07:08:33 PDT 2023
steakhal added a comment.
In D144269#4237539 <https://reviews.llvm.org/D144269#4237539>, @dkrupp wrote:
> This is a totally rewritten version of the patch which solely relies on the existing "interestingness" utility to track back the taint propagation. (And does not introduce a new FlowID in the ProgramState as requested in the reviews.)
>
> -The new version also places a Note, when the taintedness is propagated to an argument or to a return value. So it should be easier for the user to follow how the taint information is spreading.
> -"The taint originated here" is printed correctly at the taint source function, which introduces taintedness. (Main goal of this patch.)
>
> Implementation:
> -The createTaintPreTag() function places a NoteTag at the taint propagation function calls, if taintedness is propagated. Then at report creation, the tainted arguments are marked interesting if propagated taintedness is relevant for the bug report.
>
> - The isTainted() function is extended to return the actually tainted SymbolRef. This is important to be able to consistently mark relevant symbol interesting which carries the taintedness in a complex expression.
So this is how you circumvent introducing "transitive interestingness", because now you know which symbol to track.
> -createTaintPostTag(..) function places a NoteTag to the taint generating function calls to mark them interesting if they are relevant for a taintedness report. So if they propagated taintedness to interesting symbol(s).
>
> The tests are passing and the reports on the open source projects are much better understandable than before (twin, tmux, curl):
>
> https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=curl_curl-7_66_0_dkrupp_taint_origin_fix_new&run=tmux_2.6_dkrupp_taint_origin_fix_new&run=twin_v0.8.1_dkrupp_taint_origin_fix_new&is-unique=on&diff-type=New&checker-msg=%2auntrusted%2a&checker-msg=Out%20of%20bound%20memory%20access%20%28index%20is%20tainted%29
I've looked at the results you attached and they look good to me.
Do you have an example where two tainted values are contributing to the same bug? Something like:
n <- read();
m <- read();
malloc(n+m)
Will both of these values tracked back? What do the notes look like?
---
All in all, I'm pleased to see the improvements. It looks much better now IMO.
Using two NoteTags cleans up the implementation quite a bit, kudos.
I don't think there are major problems with this implementation, so I decided to spew your code with my nitpicks :D
Remember to clang-format your code. See `clang/tools/clang-format/clang-format-diff.py`.
And there are a few overloads of `getNoteTag()`, and there could be a better fit for usecases; you should decide.
I also find it difficult to track how a variable got tainted across assignments and computations like in this <https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=curl_curl-7_66_0_dkrupp_taint_origin_fix_new&run=tmux_2.6_dkrupp_taint_origin_fix_new&run=twin_v0.8.1_dkrupp_taint_origin_fix_new&is-unique=on&diff-type=New&checker-msg=%2auntrusted%2a&checker-msg=Out%20of%20bound%20memory%20access%20%28index%20is%20tainted%29&report-id=1549451&report-hash=c509c5c3af850f823db37c471550c248&report-filepath=%2afake_ntlm.c> case.
This observation is completely orthogonal to your patch, I'm just noting it. it was bad previously as well.
Maybe we could have a visitor for explaining the taint tracking across assignments and computations to complement the NoteTag.
I hope I didn't miss much this time :D
================
Comment at: clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp:238-244
+ if (kind == OOB_Tainted)
+ BT.reset(
+ new BugType(this, "Out-of-bound access", categories::TaintedData));
+ else
+ BT.reset(
+ new BugType(this, "Out-of-bound access", categories::LogicError));
+ }
----------------
================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:45
-void DivZeroChecker::reportBug(
- const char *Msg, ProgramStateRef StateZero, CheckerContext &C,
- std::unique_ptr<BugReporterVisitor> Visitor) const {
+void DivZeroChecker::reportBug(const char *Msg, std::string Category,
+ ProgramStateRef StateZero, CheckerContext &C,
----------------
It feels odd to have both `const char*` and a `std::string` on the same line.
Should we update `const char*` to a more sophisticated type?
I'm thinking of `StringRef`. It seems like that type should be used for the `Category` as well since the `PathSensitiveBugReport` constructor takes that, so we don't need to have an owning type here.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:91-96
+ SymbolRef TSR = isTainted(C.getState(), *DV);
+ if ((stateNotZero && stateZero && TSR)) {
+ reportBug("Division by a tainted value, possibly zero",
+ categories::TaintedData, stateZero, C, TSR);
return;
}
----------------
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:138-139
/// Also considers stdin as a taint source.
std::optional<SVal> getTaintedPointeeOrPointer(const CheckerContext &C,
+ const ProgramStateRef State,
SVal Arg) {
----------------
I'm not sure about passing both the `CheckerContext` and the `State`.
The `CheckerContext` already encapsulates a `State`, which opens up the possibilities for misuse.
For example, the `getPointeeOf()` is called in this function, and that will eventually call `C.getState()` under the hood. So, for me it feels like a bad API design.
What we could do instead, is to pass an `ASTContext` and a `State`; resolving this discrepancy.
Could you please check if this is a real concern?
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:158
+/// when the return value, or the outgoing parameters are tainted.
+const NoteTag *createTaintPreTag(CheckerContext &C,
+ std::vector<SymbolRef> TaintedSymbols,
----------------
This function should be an implementation detail, as such I wonder if we should make it `static`.
How about naming this function differently?
I'm thinking of `taintOriginTrackerTag`, IDK.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:162
+ const CallEvent& Call) {
+ const LocationContext* LC = Call.getCalleeStackFrame(0);
+
----------------
If the `Call` parameter is only used for acquiring the `LocationContext`, wouldn't it be more descriptive to directly pass the `LocationContext` to the function instead?
I'm also puzzled that we use `getCalleeStackFrame` here. I rarely ever see this function, so I'm a bit worried if this pick was intentional. That we pass the `0` as the `BlockCount` argument only reinforces this instinct.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:165
+ const NoteTag *InjectionTag = C.getNoteTag(
+ [TaintedSymbols, TaintedArgs,LC](PathSensitiveBugReport &BR) -> std::string {
+ SmallString<256> Msg;
----------------
It should consume the `TaintedSymbols` and `TaintedArgs` variables, as such you should `std::move` out from the original parameters like this:
```
[TaintedSymbols = std::move(TaintedSymbols), TaintedArgs = std::move(TaintedArgs)](){...}
```
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:169
+ // We give diagnostics only for taint related reports
+ if (!BR.isInteresting(LC) ||
+ BR.getBugType().getCategory() != categories::TaintedData) {
----------------
What does the first half of this condition guard against?
Do you have a test for it to demonstrate?
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:173-176
+ if (TaintedSymbols.empty()){
+ Out << "Taint originated here";
+ return std::string(Out.str());
+ }
----------------
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:177-183
+ int i = 0;
+ for (SymbolRef SR : TaintedSymbols) {
+ LLVM_DEBUG(llvm::dbgs() << "Taint Propagated from argument"
+ << TaintedArgs.at(i) << "\n");
+ BR.markInteresting(SR);
+ i++;
+ }
----------------
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:192
+/// when the return value, or the outgoing parameters are tainted.
+const NoteTag *createTaintPostTag(CheckerContext &C,
+ std::vector<SymbolRef> TaintedSymbols,
----------------
How about calling this `taintPropagationExplainerTag`?
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:197
+ const LocationContext* LC = Call.getCalleeStackFrame(0);
+
+ const NoteTag *InjectionTag = C.getNoteTag(
----------------
Please assert that the size of `TaintedSymbols` must be the same as `TaintedArgs`. Same in the other function.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:198
+
+ const NoteTag *InjectionTag = C.getNoteTag(
+ [TaintedSymbols, TaintedArgs, LC](PathSensitiveBugReport &BR) -> std::string {
----------------
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:214
+ << TaintedArgs.at(i) << "\n");
+ Out << "Taint propagated to argument " << TaintedArgs.at(i);
+ } else {
----------------
So, if `TaintedSymbols.size() > 1`, then the note message will look weird.
Could you please have a test for this?
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:871
+ }else{
+ LLVM_DEBUG(llvm::dbgs() << "Strange. Return value not tainted.\n");
+ LLVM_DEBUG(Call.getReturnValue().dumpToStream(llvm::dbgs()));
----------------
I cannot see a test against the "Strange" string. Is this dead code?
Same for the other block.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:919-922
+ //Add taintedness to stdin parameters
+ if (isStdin(C.getSVal(E),C.getASTContext())){
+ State = addTaint(State, C.getSVal(E));
+ }
----------------
I just want to get it confirmed that this hunk is unrelated to your change per-say. Is it?
BTW I don't mind this change being part of this patch, rather the opposite. Finally, we will have it.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:942
bool IsMatching = PropSrcArgs.isEmpty();
- ForEachCallArg(
- [this, &C, &IsMatching, &State](ArgIdxTy I, const Expr *E, SVal) {
- IsMatching = IsMatching || (PropSrcArgs.contains(I) &&
- isTaintedOrPointsToTainted(E, State, C));
- });
+ const NoteTag *InjectionTag = nullptr;
+
----------------
I prefer to move declarations close to their uses.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:1003
State = State->set<TaintArgsOnPostVisit>(C.getStackFrame(), Result);
- C.addTransition(State);
+ InjectionTag = createTaintPreTag(C,TaintedSymbols,TaintedIndexes,Call);
+ C.addTransition(State, InjectionTag);
----------------
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:1025-1028
+ SymbolRef TSR = isTainted(C.getState(),*TaintedSVal);
+ if (TSR)
+ report->markInteresting(TSR);
+
----------------
================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:169-173
+ SymbolRef TSR;
+ if (TSR = isTainted(State, ER->getSuperRegion(), K))
+ return TSR;
+ else if (TSR = isTainted(State, ER->getIndex(), K))
+ return TSR;
----------------
I'd probably swap the two branches, so that the index would be tracked if both the region and the index are tainted.
I also wonder if this edge-case could be tested at all.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:228-230
+ if (SymbolRef TSR =
+ isTainted(State, SRV->getRegion(), Kind))
+ return TSR;
----------------
What does `TSR` abbreviate? I would find `TaintedSym` more descriptive.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp:220-228
+ if (Kind == VLA_Tainted)
+ BT.reset(new BugType(this,
+ "Dangerous variable-length array (VLA) declaration",
+ categories::TaintedData));
+ else
+ BT.reset(new BugType(this,
+ "Dangerous variable-length array (VLA) declaration",
----------------
Why don't we use a distinct BugType for this?
================
Comment at: clang/test/Analysis/taint-diagnostic-visitor.c:46
// expected-note at -1 {{Taint originated here}}
+ // expected-note at -2 {{Taint propagated to argument 1}}
int vla[x]; // expected-warning {{Declared variable-length array (VLA) has tainted size}}
----------------
I'd suggest to count from 1 instead of 0 when referring to Nth arguments.
You can also use the `llvm::getOrdinalSuffix(N)` to get nicer messages.
We already count from 1 in the std library checker.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D144269/new/
https://reviews.llvm.org/D144269
More information about the cfe-commits
mailing list