[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Feb 27 02:49:05 PST 2023


steakhal added a comment.

First and foremost, I want to deeply apologize about my rushed response. When I say that the `Taint originated here` note remained, I **wrongly** draw conclusions. Won't happen again.

__Taint-flow IDs__:
I would challenge that we need a flow ID (number) because for me the tainted symbol already has a unique ID which should be suitable for this purpose.
What I can see is that it would be useful to know directly what was the first tainted symbols of any given tainted symbol. (`SymbolData -> SymbolData` mapping)
This is pretty much analog with what you implemented by piggybacking on the taint kind. Although, I'd argue that having an explicit mapping would be cleaner, but I can see that it's more on personal preference.
In addition to that, I think there is value in minimizing the number of places where we introduce such static counters, so I would adwise against that unless we have a good reason for doing so.

I'm thinking that although it's handy to have the originally tainted symbol directly at the error-node at hand, I'm still not sure if that couldn't be calculated and tracked back to the place where we introduced taint.

  int n;
  scanf("%d", &n); // binds $1, not important
  int v = mytaintsource(n); // taint originated here, returns $2
  int z = taintprop(v); // taint propagated here, returns $3
  int x = 42 / z; // 42 div $3

__Soundness of the back-propagation__:
The back-propagation is always in-sync **given that** the state-transition of introducing taint **also attaches** the `NoteTag` explaining what it did and why.
This basically means that after we call `addTaint()` we must also add a `NoteTag` when we call `addTransition()`. Under these circumstances I find it easier to argue that the back-propagation is consistent (sound). If a specific checker (other than the `GenericTaintChecker`) models taint, it should stick to the rules described and emit the right `NoteTag`. That way even downstream checkers would play nicely with the taint-tracking and benefit from it.

__Interestingness__:
The concerns seem valid about that the set of interesting symbols could be larger than the actually required (and desired) set. However, I wouldn't worried about this much unless we have an example on which we can continue discussing that this is a real concern.
What I can see is that as-of-now we use the interestingness notion for this, and deviating or introducing something else would introduce complexity. So, I'm not strictly against having something else, but I would lean towards using interestingness here as well, unless we have clear-cut examples demonstrating the need for something more sophisticated.

Finally, I'd like to thank you for investing your time into this subject. We really should have done it much earlier. Without these similar improvements like this one, it's just half way done. Thank you.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269



More information about the cfe-commits mailing list