[cfe-dev] [analyzer] Tracking values for generating bug reports

Fri Aug 9 18:48:49 PDT 2019

On 8/8/19 2:05 PM, Kristóf Umann wrote:
> I got another round! The following new run names were added:
>
> .* + all non-control dependency changes: [1, 2] applied
> .*AFTER tracking conditions + all non-control dependency changes: [1, 
> 2] applied, "track-conditions" enabled
> .*AFTER tracking conditions FINAL: "track-conditions" enabled, [1-8] 
> applied (also with ~10 or so NFC patches)
> .*AFTER tracking conditions FINAL + DEBUG notes: "track-conditions" 
> and "track-conditions-debug" enabled, [1-8] applied (also with ~10 or 
> so NFC patches)
> [1] https://reviews.llvm.org/D64232[analyzer] Prune calls to functions 
> with linear CFGs that return a non-zero constrained value
> [2] https://reviews.llvm.org/D64287 [analyzer] Track the right hand 
> side of the last store regardless of its value
> [3] https://reviews.llvm.org/D64270[analyzer][NFC] Prepare visitors 
> for different tracking kinds
>
> [4] https://reviews.llvm.org/D64272[analyzer] Note last writes to a 
> condition only in a nested stackframe
>
> [5] https://reviews.llvm.org/D65287 [analyzer][CFG] Don't track the 
> condition of asserts
> [6] https://reviews.llvm.org/D65575 [analyzer] Mention whether an 
> event is about a condition in a bug report part 1
> [7] https://reviews.llvm.org/D65724 [analyzer] Don't make 
> ConditionBRVisitor events prunable when the condition is an 
> interesting field
> [8] https://reviews.llvm.org/D65725 [analyzer] Mention whether an 
> event is about a condition in a bug report part 2
>
> Conclusions:
>
> Shockingly, no condition tracking related changes were made in the 
> FINAL run for either Bitcoin or Xerces, so we only have LLVM/Clang to 
> talk about. This time, I added comments, which, after clicking on a 
> report, can be retrieved by clicking on 'Comments' in the upper right 
> corner. These comments are far more precise than the notes I left 
> previously.
>
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL
>
> The negatives:
> 1. Extra notes to functions following this pattern were very-very common:
>
> bool a(int b) {
>   return b == 0;
>   // Assuming 'b' is equal to 0
> // Returning the value 1, which will be (a part of a) condition
> }
>
> void f(int b) {
>   int *x = 0;
>   if (a(b)) // call to 'a', returning from 'a'
>     *x = 5; // nullptr dereference
> }
>
> These notes are totally meaningless, and they aren't pruned out by [1] 
> because the assumption notes are non-prunable (they explain a 
> variable, b in this case, that is marked as interesting, as the return 
> value is tracked by ReturnVisitor). A possible solution would be to 
> prune linear functions in ConditionBRVisitor as well similarly to [1], 
> but then we'd lose the assumption note. Instead, moving the 
> non-prunable assumption note to the actual collapse point would be 
> better, but a quick look at the generated ExplodedGraph reveals that 
> this no longer an bug report construction issue: the collapse point is 
> in fact at the return statement, rather then the condition.
>
> So, I think the bug report should always explain how the analyzer 
> thinks. Ideally, this should be identical to how the bug could be 
> reproduced, but if it is not (in this case the assumption should be 
> made when a(b) is the condition), maybe the logic of the analysis 
> should be changed instead.
>
> Example: Notes 19-23, 
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=8c536e75492d2588898e03178a704efb&report=12672&subtab=8c536e75492d2588898e03178a704efb

This is due to the eagerly-assume behavior. The state split does 
actually happen at == or ! rather than at the if-statement.

I guess you could check if the node is wearing an "ExprEngine: Eagerly 
Assume" tag.

> 2. I didn't come across this too often, but it happened:
>
> int getInt(); int getInt2();
>
> bool a() {
>   return getInt2() && getInt();
>   // Returning value, which will be (a part of a) condition
> }
>
> void f() {
>   int *x = 0;
>   if (a()) // call to 'a', returning from 'a'
>     *x = 5; // nullptr dereference
> }
>
> This note wasn't pruned because a() has in fact more than 3 basic 
> blocks, but it really doesn't change anything. I guess we *could* 
> prune it, but I'm leaning on accepting it as a sacrifice.
>
> Example: Notes 65-68, 
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=8cc2c55ab79efc36fcd2b1725648c2c3&report=12678&subtab=8cc2c55ab79efc36fcd2b1725648c2c3

I guess it depends:

- If we're assuming that the condition is true, and the function returns 
A && B, the note is indeed redundant.
- If we're assuming that the condition is true, and the function returns 
A || B, it's worth mentioning whether we think that A is true or we 
think that B is true.
- If we're assuming that the condition is false, and the function 
returns A && B, it's worth mentioning whether we think that A is false 
or we think that B is false.
- If we're assuming that the condition is false, and the function 
returns A || B, the note is indeed redundant.

>
> 3. Some function calls are lost due to retrieving information from 
> nested stack frames only, rather than different stack frames. I only 
> encountered this when the condition variable was invalidated: The 
> "AFTER tracking conditions WITH improved moderation" runs display the 
> path to the invalidation point, yet the FINAL runs do not.
>
> Example: Displaying the invalidation point on note 9: 
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=1d6a5c6fcca79b12b760b41924c9c2c3&report=7774&subtab=1d6a5c6fcca79b12b760b41924c9c2c3
>
> 4. Calls to operator!= in foreach loops -- Are these necessary? Is 
> there any, even articial cases where this is actually meaningful? This 
> is pretty much the only non-constructive addition of notes that are 
> introduced only because of condition tracking.
>
> Example: Notes 13-17, 22-26, 37-41: 
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=50cc2bed223dc4ba62f46d0665346a5e&report=12743&subtab=50cc2bed223dc4ba62f46d0665346a5e

Mmm, yeah, i'd rather skip them. We don't make much sense out of 
iterators anyway. I wish IteratorChecker could provide us with better notes.

> Positives:
>
> In general, the wording as the analyses were run may be imperfect, but 
> they indicate what the message is about very precisely. The path of 
> execution is much better explained, and even if the extra function 
> call isn't meaningful, the note messages state clearly that we're only 
> talking about a condition, I can skip them without being bothered by 
> it too much. I am given extra information without the cost of 
> polluting the rest of the report.
>
> Here is a collection of my favorites, though I found most of the extra 
> notes nice:
>
> 1. Here, we encounter a nullpointer dereference issue. Op0 is assumed 
> to be equal to Op1, but only due to condition tracking do we explain 
> that Op1 is believed to be null, and THAT is why this assumption (Op1 
> == Op0) is so important!
>
> Notes 23-31: 
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=6b5f3e43213b64eaef4a25a249c59baf&report=12955&subtab=6b5f3e43213b64eaef4a25a249c59baf
>

The note is indeed extremely helpful, but the warning itself should have 
been suppressed by the "inlined defensive check" heuristic. I guess the 
heuristic doesn't know how to carry over to the other side of the `==`.

> 2. The seldom addition of extra "Returning value, which will be (a 
> part of a) condition" notes, love them!
>
> Example: Note 13 of 
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=5086ed6620709edc2d9a5ed4609e186d&report=12888&subtab=5086ed6620709edc2d9a5ed4609e186d
>
> 3. This one is weird, I'm not yet sure how this note came about, but 
> yet again, it isn't clear AT ALL in the original report why CountS 
> will be a source of a null dereference error. The extra note tells us 
> that CountS is non-null, but its pointee is.
>
> Note 19: 
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=356d88cb062474fdf17584bf1451bffc&report=13412&subtab=356d88cb062474fdf17584bf1451bffc

You're talking about the "Assuming pointer value is null" thingy? I 
suspect that it's a bug and the note is in fact meaningless. See also 
https://reviews.llvm.org/D65889?id=214186#inline-590619 - i've seen them 
before but i suspect that i suddenly see more of them these days. 
Generally, i still don't understand the report after seeing these notes. 
Even if the notes tell anything about the pointee, the code they're 
attached to doesn't.

I also don't understand why did we bring in the copy-constructor. Was 
there an invalidation of all heap going on that also touched *CountS?

> Also, adding the actual mail that was lost ;)
>
> On Wed, 24 Jul 2019 at 13:57, Kristóf Umann <dkszelethus at gmail.com 
> <mailto:dkszelethus at gmail.com>> wrote:
>
>
>
>     On Wed, 24 Jul 2019 at 04:29, Artem Dergachev <noqnoqneo at gmail.com
>     <mailto:noqnoqneo at gmail.com>> wrote:
>
>         Xerces - DOMRangeImpl.cpp: Ugh, are those "Value assigned
>         to..." notes triggered by invalidation? I have a plan of my
>         own to wipe such notes out (or at least formulate them more
>         carefully), so i guess it's not an action item for you (nor
>         much of a regression), but still ugh. Another thing to think
>         about for those is to suppress the warning entirely when a
>         region that stores is important condition is first assumed to
>         contain false, then invalidated, then assumed to contain true.
>
>
>     Totally agreed.
>
>         Xerces - XMLString.hpp: Mmm, i guess passing unconstrained
>         values as parameters is also not very meaningful. It doesn't
>         look like it's a *big* problem, but i guess it's worth
>         considering fixing eventually.
>
>
>     In this particular case, partly due to me not being familiar with
>     the code, I found the note helpful, but in LLVM there were plenty
>     of reports where I didn't feel this way, so I'll get that fixed.
>
>         > Assert conditions are not yet ignored, I have a neat
>         solution locally
>
>         Just to make sure - do you have a test case with control flow
>         within the condition, eg. `assert((a || b) && "the universe is
>         collapsing")`?
>
>
>     I don't :) Thanks!
>
>         On 7/23/19 12:49 PM, Kristóf Umann wrote:
>>         I'm happy to delight you all with some more results! I was
>>         far more strict on my judgement this time around, marking a
>>         report good when I genuinely believed that the note made the
>>         report better.
>>
>>         All the runs were made on monorepo commit
>>         db4e335b8643056f5afa2a0d8d175d95b1fb7e28 with the following
>>         patches applied:
>>
>>         BEFORE tracking condition:  Without any changes being
>>         made, default CodeChecker settings.
>>         AFTER tracking conditions: Without any changes being made,
>>         track-conditions set to true.
>>         AFTER tracking conditions + DEBUG notes: Without any changes
>>         being made, track-conditions and track-conditions-debug set
>>         to true.
>>         AFTER linear pruning (or similar run name): [1] applied,
>>         track-conditions set to true
>>         AFTER tracking conditions WITH improved moderation: [1, 2, 3,
>>         4] applied, track-conditions set to true (stupid name but
>>         it's gonna stay like this for now :) )
>>         AFTER tracking conditions WITH improved moderation: [1, 2, 3,
>>         4] applied, track-conditions and track-conditions-debug  set
>>         to true
>>         [1] https://reviews.llvm.org/D64232[analyzer] Prune calls to
>>         functions with linear CFGs that return a non-zero constrained
>>         value
>>         [2] https://reviews.llvm.org/D64287 [analyzer] Track the
>>         right hand side of the last store regardless of its value
>>
>>         [3] https://reviews.llvm.org/D64270[analyzer][NFC] Prepare
>>         visitors for different tracking kinds
>>
>>         [4] https://reviews.llvm.org/D64272[analyzer] Note last
>>         writes to a condition only in a nested stackframe
>>
>>
>>         LLVM/Clang (no longer analyzing clang-tools-extra):
>>         http://cc.elte.hu:15001/GSoC2019-July-5/#run=LLVM%2FClang%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>>         Bitcoin:
>>         http://cc.elte.hu:15001/GSoC2019-July-5/#run=Bitcoint%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=Bitcoint%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>>         Xerces:
>>         http://cc.elte.hu:15001/GSoC2019-July-5/#run=Xerces%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=Xerces%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>>         Some conclusions:
>>
>>           * [4] Actually notes last writes in /different/, not
>>             /nested/ stack frames. Gotta fix that.
>>           * [1] Returning an unconstrained value when the function is
>>             linear is *never* meaningful. Not for conditions, not for
>>             anything else. Gotta fix that too. It's worth talking
>>             about whether we should prune these for conditions even
>>             if the function isn't linear.
>>           * [1] This one gets rid of almost all the operator bool
>>             calls. Maybe in the rare case it doesn't, we should
>>             preserve them? I always planned to get rid of calls to
>>             operator bool, but now I'm hesitant.
>>           * Assert conditions are not yet ignored, I have a neat
>>             solution locally, but I haven't published or tested it yet.
>>
>
>     One more thing: If the condition is a binary operator, like (B ||
>     A) is assumed true because A is true, or (A && B) is assumed false
>     because A is false, we should only track A.
>
>>         There honestly isn't that much I have on my mind, I think
>>         after the next round of analyses we should be good! I really
>>         should've made an analysis with all the non-condition
>>         tracking changes to get an even better perspective, so I'll
>>         do that next time.
>>
>>         On Tue, 25 Jun 2019 at 04:00, Artem Dergachev
>>         <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> wrote:
>>
>>             Yup, looks much better! Let's actually dig through other
>>             projects a bit, but results on LLVM look fairly perfect
>>             to me.
>>
>>
>>             > ASTDiagnostic.cpp
>>
>>             Why wasn't the extra tracking removed in the moderate
>>             mode? There doesn't seem to be an obvious overwrite for
>>             `kind`. Is it reacting to an invalidation?
>>
>>         The report is now untouched!
>>
>>             > DominanceFrontierImpl.h
>>
>>             Why does it track so much more expressions in the
>>             moderate mode (even if not producing any actual nodes for
>>             them)? Is it because you changed the de-duplication
>>             method from by-expr to by-node in between?
>>
>>         I'll make a testcase for that. Honestly, I'm thinking of
>>         reimplementing the visitor as well, so that I'll collect the
>>         control dependencies in a set when the visitor object is
>>         created, and track them at BlockExit locations, that should
>>         limit it, because yeah, I believe they originate from the
>>         de-duplication.
>>
>>             > CGObjCGNU.cpp
>>
>>             I'd like to think more about this one. It doesn't look to
>>             me that the new notes are helpful here, for the reason
>>             that the problem is pretty much entirely between the
>>             definition of OID and the use, while all of the notes are
>>             outside that range. The report wouldn't become less valid
>>             (for a certain definition of valid) if we simply omit
>>             everything except the last three notes:
>>               - 'OID' is initialized to a null pointer value
>>               - Loop body skipped when range is empty
>>               - Called ++ object pointer is null
>>             I don't see any immediate solutions here; i guess the
>>             right thing to do would be to simply learn how to chop
>>             off the irrelevant beginning of the path.
>>
>>
>>         This time around, BodyFarm.cpp is in the same boat. I really
>>         just made a fundamentally bad (like, really bad) report even
>>         worse.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190809/7abc8ffd/attachment.html>