[cfe-dev] [analyzer] Tracking values for generating bug reports
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Fri Aug 9 18:48:49 PDT 2019
On 8/8/19 2:05 PM, Kristóf Umann wrote:
> I got another round! The following new run names were added:
>
> .* + all non-control dependency changes: [1, 2] applied
> .*AFTER tracking conditions + all non-control dependency changes: [1,
> 2] applied, "track-conditions" enabled
> .*AFTER tracking conditions FINAL: "track-conditions" enabled, [1-8]
> applied (also with ~10 or so NFC patches)
> .*AFTER tracking conditions FINAL + DEBUG notes: "track-conditions"
> and "track-conditions-debug" enabled, [1-8] applied (also with ~10 or
> so NFC patches)
> [1] https://reviews.llvm.org/D64232[analyzer] Prune calls to functions
> with linear CFGs that return a non-zero constrained value
> [2] https://reviews.llvm.org/D64287 [analyzer] Track the right hand
> side of the last store regardless of its value
> [3] https://reviews.llvm.org/D64270[analyzer][NFC] Prepare visitors
> for different tracking kinds
>
> [4] https://reviews.llvm.org/D64272[analyzer] Note last writes to a
> condition only in a nested stackframe
>
> [5] https://reviews.llvm.org/D65287 [analyzer][CFG] Don't track the
> condition of asserts
> [6] https://reviews.llvm.org/D65575 [analyzer] Mention whether an
> event is about a condition in a bug report part 1
> [7] https://reviews.llvm.org/D65724 [analyzer] Don't make
> ConditionBRVisitor events prunable when the condition is an
> interesting field
> [8] https://reviews.llvm.org/D65725 [analyzer] Mention whether an
> event is about a condition in a bug report part 2
>
> Conclusions:
>
> Shockingly, no condition tracking related changes were made in the
> FINAL run for either Bitcoin or Xerces, so we only have LLVM/Clang to
> talk about. This time, I added comments, which, after clicking on a
> report, can be retrieved by clicking on 'Comments' in the upper right
> corner. These comments are far more precise than the notes I left
> previously.
>
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL
>
> The negatives:
> 1. Extra notes to functions following this pattern were very-very common:
>
> bool a(int b) {
> return b == 0;
> // Assuming 'b' is equal to 0
> // Returning the value 1, which will be (a part of a) condition
> }
>
> void f(int b) {
> int *x = 0;
> if (a(b)) // call to 'a', returning from 'a'
> *x = 5; // nullptr dereference
> }
>
> These notes are totally meaningless, and they aren't pruned out by [1]
> because the assumption notes are non-prunable (they explain a
> variable, b in this case, that is marked as interesting, as the return
> value is tracked by ReturnVisitor). A possible solution would be to
> prune linear functions in ConditionBRVisitor as well similarly to [1],
> but then we'd lose the assumption note. Instead, moving the
> non-prunable assumption note to the actual collapse point would be
> better, but a quick look at the generated ExplodedGraph reveals that
> this no longer an bug report construction issue: the collapse point is
> in fact at the return statement, rather then the condition.
>
> So, I think the bug report should always explain how the analyzer
> thinks. Ideally, this should be identical to how the bug could be
> reproduced, but if it is not (in this case the assumption should be
> made when a(b) is the condition), maybe the logic of the analysis
> should be changed instead.
>
> Example: Notes 19-23,
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=8c536e75492d2588898e03178a704efb&report=12672&subtab=8c536e75492d2588898e03178a704efb
This is due to the eagerly-assume behavior. The state split does
actually happen at == or ! rather than at the if-statement.
I guess you could check if the node is wearing an "ExprEngine: Eagerly
Assume" tag.
> 2. I didn't come across this too often, but it happened:
>
> int getInt(); int getInt2();
>
> bool a() {
> return getInt2() && getInt();
> // Returning value, which will be (a part of a) condition
> }
>
> void f() {
> int *x = 0;
> if (a()) // call to 'a', returning from 'a'
> *x = 5; // nullptr dereference
> }
>
> This note wasn't pruned because a() has in fact more than 3 basic
> blocks, but it really doesn't change anything. I guess we *could*
> prune it, but I'm leaning on accepting it as a sacrifice.
>
> Example: Notes 65-68,
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=8cc2c55ab79efc36fcd2b1725648c2c3&report=12678&subtab=8cc2c55ab79efc36fcd2b1725648c2c3
I guess it depends:
- If we're assuming that the condition is true, and the function returns
A && B, the note is indeed redundant.
- If we're assuming that the condition is true, and the function returns
A || B, it's worth mentioning whether we think that A is true or we
think that B is true.
- If we're assuming that the condition is false, and the function
returns A && B, it's worth mentioning whether we think that A is false
or we think that B is false.
- If we're assuming that the condition is false, and the function
returns A || B, the note is indeed redundant.
>
> 3. Some function calls are lost due to retrieving information from
> nested stack frames only, rather than different stack frames. I only
> encountered this when the condition variable was invalidated: The
> "AFTER tracking conditions WITH improved moderation" runs display the
> path to the invalidation point, yet the FINAL runs do not.
>
> Example: Displaying the invalidation point on note 9:
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=1d6a5c6fcca79b12b760b41924c9c2c3&report=7774&subtab=1d6a5c6fcca79b12b760b41924c9c2c3
>
> 4. Calls to operator!= in foreach loops -- Are these necessary? Is
> there any, even articial cases where this is actually meaningful? This
> is pretty much the only non-constructive addition of notes that are
> introduced only because of condition tracking.
>
> Example: Notes 13-17, 22-26, 37-41:
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=50cc2bed223dc4ba62f46d0665346a5e&report=12743&subtab=50cc2bed223dc4ba62f46d0665346a5e
Mmm, yeah, i'd rather skip them. We don't make much sense out of
iterators anyway. I wish IteratorChecker could provide us with better notes.
> Positives:
>
> In general, the wording as the analyses were run may be imperfect, but
> they indicate what the message is about very precisely. The path of
> execution is much better explained, and even if the extra function
> call isn't meaningful, the note messages state clearly that we're only
> talking about a condition, I can skip them without being bothered by
> it too much. I am given extra information without the cost of
> polluting the rest of the report.
>
> Here is a collection of my favorites, though I found most of the extra
> notes nice:
>
> 1. Here, we encounter a nullpointer dereference issue. Op0 is assumed
> to be equal to Op1, but only due to condition tracking do we explain
> that Op1 is believed to be null, and THAT is why this assumption (Op1
> == Op0) is so important!
>
> Notes 23-31:
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=6b5f3e43213b64eaef4a25a249c59baf&report=12955&subtab=6b5f3e43213b64eaef4a25a249c59baf
>
The note is indeed extremely helpful, but the warning itself should have
been suppressed by the "inlined defensive check" heuristic. I guess the
heuristic doesn't know how to carry over to the other side of the `==`.
> 2. The seldom addition of extra "Returning value, which will be (a
> part of a) condition" notes, love them!
>
> Example: Note 13 of
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=5086ed6620709edc2d9a5ed4609e186d&report=12888&subtab=5086ed6620709edc2d9a5ed4609e186d
>
> 3. This one is weird, I'm not yet sure how this note came about, but
> yet again, it isn't clear AT ALL in the original report why CountS
> will be a source of a null dereference error. The extra note tells us
> that CountS is non-null, but its pointee is.
>
> Note 19:
> http://cc.elte.hu:15001/GSoC2019-July-5/#is-unique=on&run=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20WITH%20condition%20tracking%20FINAL&reportHash=356d88cb062474fdf17584bf1451bffc&report=13412&subtab=356d88cb062474fdf17584bf1451bffc
You're talking about the "Assuming pointer value is null" thingy? I
suspect that it's a bug and the note is in fact meaningless. See also
https://reviews.llvm.org/D65889?id=214186#inline-590619 - i've seen them
before but i suspect that i suddenly see more of them these days.
Generally, i still don't understand the report after seeing these notes.
Even if the notes tell anything about the pointee, the code they're
attached to doesn't.
I also don't understand why did we bring in the copy-constructor. Was
there an invalidation of all heap going on that also touched *CountS?
> Also, adding the actual mail that was lost ;)
>
> On Wed, 24 Jul 2019 at 13:57, Kristóf Umann <dkszelethus at gmail.com
> <mailto:dkszelethus at gmail.com>> wrote:
>
>
>
> On Wed, 24 Jul 2019 at 04:29, Artem Dergachev <noqnoqneo at gmail.com
> <mailto:noqnoqneo at gmail.com>> wrote:
>
> Xerces - DOMRangeImpl.cpp: Ugh, are those "Value assigned
> to..." notes triggered by invalidation? I have a plan of my
> own to wipe such notes out (or at least formulate them more
> carefully), so i guess it's not an action item for you (nor
> much of a regression), but still ugh. Another thing to think
> about for those is to suppress the warning entirely when a
> region that stores is important condition is first assumed to
> contain false, then invalidated, then assumed to contain true.
>
>
> Totally agreed.
>
> Xerces - XMLString.hpp: Mmm, i guess passing unconstrained
> values as parameters is also not very meaningful. It doesn't
> look like it's a *big* problem, but i guess it's worth
> considering fixing eventually.
>
>
> In this particular case, partly due to me not being familiar with
> the code, I found the note helpful, but in LLVM there were plenty
> of reports where I didn't feel this way, so I'll get that fixed.
>
> > Assert conditions are not yet ignored, I have a neat
> solution locally
>
> Just to make sure - do you have a test case with control flow
> within the condition, eg. `assert((a || b) && "the universe is
> collapsing")`?
>
>
> I don't :) Thanks!
>
> On 7/23/19 12:49 PM, Kristóf Umann wrote:
>> I'm happy to delight you all with some more results! I was
>> far more strict on my judgement this time around, marking a
>> report good when I genuinely believed that the note made the
>> report better.
>>
>> All the runs were made on monorepo commit
>> db4e335b8643056f5afa2a0d8d175d95b1fb7e28 with the following
>> patches applied:
>>
>> BEFORE tracking condition: Without any changes being
>> made, default CodeChecker settings.
>> AFTER tracking conditions: Without any changes being made,
>> track-conditions set to true.
>> AFTER tracking conditions + DEBUG notes: Without any changes
>> being made, track-conditions and track-conditions-debug set
>> to true.
>> AFTER linear pruning (or similar run name): [1] applied,
>> track-conditions set to true
>> AFTER tracking conditions WITH improved moderation: [1, 2, 3,
>> 4] applied, track-conditions set to true (stupid name but
>> it's gonna stay like this for now :) )
>> AFTER tracking conditions WITH improved moderation: [1, 2, 3,
>> 4] applied, track-conditions and track-conditions-debug set
>> to true
>> [1] https://reviews.llvm.org/D64232[analyzer] Prune calls to
>> functions with linear CFGs that return a non-zero constrained
>> value
>> [2] https://reviews.llvm.org/D64287 [analyzer] Track the
>> right hand side of the last store regardless of its value
>>
>> [3] https://reviews.llvm.org/D64270[analyzer][NFC] Prepare
>> visitors for different tracking kinds
>>
>> [4] https://reviews.llvm.org/D64272[analyzer] Note last
>> writes to a condition only in a nested stackframe
>>
>>
>> LLVM/Clang (no longer analyzing clang-tools-extra):
>> http://cc.elte.hu:15001/GSoC2019-July-5/#run=LLVM%2FClang%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=LLVM%2FClang%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>> Bitcoin:
>> http://cc.elte.hu:15001/GSoC2019-July-5/#run=Bitcoint%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=Bitcoint%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>> Xerces:
>> http://cc.elte.hu:15001/GSoC2019-July-5/#run=Xerces%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&review-status=Confirmed&review-status=False%20positive&review-status=Intentional&detection-status=New&detection-status=Reopened&detection-status=Unresolved&tab=Xerces%20AFTER%20condition%20tracking%20WITH%20improved%20moderation&is-unique=on
>>
>> Some conclusions:
>>
>> * [4] Actually notes last writes in /different/, not
>> /nested/ stack frames. Gotta fix that.
>> * [1] Returning an unconstrained value when the function is
>> linear is *never* meaningful. Not for conditions, not for
>> anything else. Gotta fix that too. It's worth talking
>> about whether we should prune these for conditions even
>> if the function isn't linear.
>> * [1] This one gets rid of almost all the operator bool
>> calls. Maybe in the rare case it doesn't, we should
>> preserve them? I always planned to get rid of calls to
>> operator bool, but now I'm hesitant.
>> * Assert conditions are not yet ignored, I have a neat
>> solution locally, but I haven't published or tested it yet.
>>
>
> One more thing: If the condition is a binary operator, like (B ||
> A) is assumed true because A is true, or (A && B) is assumed false
> because A is false, we should only track A.
>
>> There honestly isn't that much I have on my mind, I think
>> after the next round of analyses we should be good! I really
>> should've made an analysis with all the non-condition
>> tracking changes to get an even better perspective, so I'll
>> do that next time.
>>
>> On Tue, 25 Jun 2019 at 04:00, Artem Dergachev
>> <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> wrote:
>>
>> Yup, looks much better! Let's actually dig through other
>> projects a bit, but results on LLVM look fairly perfect
>> to me.
>>
>>
>> > ASTDiagnostic.cpp
>>
>> Why wasn't the extra tracking removed in the moderate
>> mode? There doesn't seem to be an obvious overwrite for
>> `kind`. Is it reacting to an invalidation?
>>
>> The report is now untouched!
>>
>> > DominanceFrontierImpl.h
>>
>> Why does it track so much more expressions in the
>> moderate mode (even if not producing any actual nodes for
>> them)? Is it because you changed the de-duplication
>> method from by-expr to by-node in between?
>>
>> I'll make a testcase for that. Honestly, I'm thinking of
>> reimplementing the visitor as well, so that I'll collect the
>> control dependencies in a set when the visitor object is
>> created, and track them at BlockExit locations, that should
>> limit it, because yeah, I believe they originate from the
>> de-duplication.
>>
>> > CGObjCGNU.cpp
>>
>> I'd like to think more about this one. It doesn't look to
>> me that the new notes are helpful here, for the reason
>> that the problem is pretty much entirely between the
>> definition of OID and the use, while all of the notes are
>> outside that range. The report wouldn't become less valid
>> (for a certain definition of valid) if we simply omit
>> everything except the last three notes:
>> - 'OID' is initialized to a null pointer value
>> - Loop body skipped when range is empty
>> - Called ++ object pointer is null
>> I don't see any immediate solutions here; i guess the
>> right thing to do would be to simply learn how to chop
>> off the irrelevant beginning of the path.
>>
>>
>> This time around, BodyFarm.cpp is in the same boat. I really
>> just made a fundamentally bad (like, really bad) report even
>> worse.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190809/7abc8ffd/attachment.html>
More information about the cfe-dev
mailing list