[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Daniel Krupp via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Feb 24 05:44:24 PST 2023


dkrupp added a comment.

> TaintBugReport is brilliant and we already have a precedent for subclassing BugReport in another checker. However I'm somewhat worried that once we start doing more of this, we'll eventually end up with multiple inheritance situations when the report needs multiple kinds of information. So at a glance my approach with a "generic data map" in bugreport objects looks a bit more future-proof to me. Also a bit easier to set up, no need to deal with custom RTTI.

Adding a data map (like a string->sval map) to the `PathSensitiveBugreport` instead of relying on dynamic casting sounds an easy addition. I will update the patch with this. Or you specifically mean this kind of datamap ?  `typedef llvm::ImmutableMap<void*, void*>   GenericDataMap;` (`ProgramState.h:74`) I guess it should not be immutable…

> I guess my main point is, there shouldn't be a need to assist tracking by adding extra information to the program state. Information in the state should ideally be "material" to program execution, "tangible", it has to describe something that's actually stored somewhere in memory (either by directly defining it, or by constraining it). In particular, if two nodes result in indistinguishable future behavior of the program, we're supposed to merge them; but any "immaterial" bits of information in the state will prevent that from happening.

@NoQ aha! Now I see where  you are coming from! If an SVal is tainted on both analysis branches, but their taint flow value is different (meaning that they carry taint values from different taint sources), then they cannot be merged which causes inefficiency.
I understand the generic principle, but I wonder how frequent would that be in practice. I would think not too much, because taint sources are uncommon. Especially having multiple taint sources in the same source file/Translation Unit (only that creates different taint flow ids).

> In our case it should be enough to have the lambda for propagation method ask "Hey, is this freshly produced propagation target value relevant to this specific report?" and if yes, mark the corresponding propagation source value as relevant to the report as well; also emit a note and "consume" the mark on the target value. Such chain of local decisions can easily replace the global taint flow identifier, and it's more flexible because this way the flow doesn't need to be "linear", it may branch in various ways and that's ok.

Taint propagation is not only handled in the `GenericTaintChecker:892`, where we calculate that the taintedness should propagate from function argument `x` to `y` or return value, but also it spreads in peculiar ways within expressions, from subregion to parent region etc. handled in the `addtaint(..)` and `addPartialTaint(..)` functions in `Taint.cpp`. What your proposed solution would essentially mean that we would need to implement the taint propagation in backward direction too. I think this design would be fragile and difficult to maintain (especially if taint propagation would change in the future).

I definitely don’t have the full picture here, so if you think that the sval backtracking is the better way, because of the potential performance penalty of the taintflow solution with the merges, I will go down that road and start to work on an alternative patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269



More information about the cfe-commits mailing list