[llvm-dev] GEP with a null pointer base

Mon Jul 24 09:08:38 PDT 2017

On Mon, Jul 24, 2017 at 9:02 AM Peter Lawrence <peterl95124 at sbcglobal.net>
wrote:

> On Jul 21, 2017, at 10:55 PM, Mehdi AMINI <joker.eph at gmail.com> wrote:
>
>
>
> 2017-07-21 22:44 GMT-07:00 Peter Lawrence <peterl95124 at sbcglobal.net>:
>
>> Mehdi,
>>            Hal’s transformation only kicks in in the *presence* of UB
>>
>
> No, sorry I entirely disagree with this assertion: I believe we optimize
> program where there is no UB. We delete dead code, code that never runs, so
> it is code that does not exercise UB.
>
>
>
> Mehdi,
>       I had to read that sentence several times to figure out what the
> problem
> is, which is sloppy terminology on my part
>
> Strictly speaking the C standard uses “undefined behavior” to describe what
> happens at runtime when an “illegal” construct is executed.  I have been
> using
> “undefined behavior” and UB to describe the “illegal” construct whether it
> is
> executed or not.
>
> Hence I say “Hal’s transform is triggered by UB”, when I should be saying
> “Hal’s transformation is triggered by illegal IR”.
>
> All I can say is I’m not the only one being sloppy, what started this
> entire
> conversation is the paper titled “Taming Undefined Behavior in LLVM”, while
> the correct title would be “Taming Illegal IR in LLVM”.  (I think we are
> all
> pretty confident that LLVM itself is UB-free, or at least we all hope so
> :-).
> I believe you are being sloppy when you say "we optimize program
> where there is no UB”, because I believe you mean "we optimize program
> under the assumption that there is no UB”. In other words we recognize
> “Illegal” constructs and then assume they are unreachable, and delete
> them, even when we can’t prove by any other means that they are
> unreachable. We don’t know that there is no (runtime) UB, we just assume
> it.
>
>
> The example Hal showed does not exhibit UB, it is perfectly valid
> according to the standard.
>
>
> Whether it exhibits UB at runtime or not is not the issue, the issue is
> what
> a static analyzer or compiler can tell before runtime, see below
>
>
>
>> , and
>> it does not matter how that UB got there, whether by function inlining
>> or without function inlining.
>>
>> The problem with Hal’s argument is that the compiler does not have
>> a built in ouija board with which it can conjure up the spirit of the
>> author of the source code and find out if the UB was intentional
>> with the expectation of it being deleted, or is simply a bug.
>> Function inlining does not magically turn a bug into not-a-bug, nor
>> does post-inlining simplification magically turn a bug into not-a-bug.
>>
>> Let me say it again:  if the compiler can find this UB (after whatever
>> optimizations it takes to get there) then the static analyzer must
>> be able to do the same thing, forcing the programmer to fix it
>> rather than have the compiler optimize it.
>>
>
> This is again incorrect: there is no UB in the program, there is nothing
> the static analyzer should report.
>
>
>
> Hal’s example starts with this template
>
> template <typename T>
> int do_something(T mask, bool cond) {
>   if (mask & 2)
>     return 42;
>
>   if (cond) {
>     T high_mask = mask >> 48;                // UB if sizeof(T) < 8, and
> cond true
>     if (high_mask > 5)
>       do_something_1(high_mask);
>     else
>       do_something_2();
>   }
>
>   return 0;
> }
>
>
> Which is then instantiated with T = char,
> and where it is impossible for either a static analyzer or a
> compiler to figure out and prove that ‘cond’ is always false.
>
> Hence a static analyzer issues a warning about the shift,
> while llvm gives no warning and instead optimizes the entire
> if-statement away on the assumption that it is unreachable.
>
> Yes a static analyzer does issue a warning in this case.
>
>
> This is not the only optimization to be based on assumption
> rather than fact, for example type-based-alias-analysis is
> based on the assumption that the program is free of this sort
> of aliasing. The difference is that a user can disable TBAA
> and only TBAA if a program seems to be running incorrectly
> when optimized and thereby possibly track down a bug, but
> so far there is no command line option to disable UB-based-
> analysis (or ‘illegal-IR-based” :-), but there really needs to be.
>
> Do we at least agree on that last paragraph ?
>

We likely agree it's good to have tools to help developers
identify/diagnose UB in their programs. And we have that:
-fsanitize=undefined (not only does it effectively disable many UB-based
optimizations (because it makes them not undefined - by conditionalizing
the code to check that UB isn't reached, as such) - it even provides pretty
diagnostics (of course you can't actually continue running the program - if
the line after the diagnostic will dereference a null pointer - there's no
non-null pointer we can magic-up, so execution must stop))

>
>
> Peter Lawrence.
>
>
>
>
>
>
> The compile is still able to delete some code, because of breaking the
> abstraction through inlining or template instantiation for example (cf Hal
> example).
>
> --
> Mehdi
>
>
>
>>
>> Or, to put it another way:  there is no difference between a compiler
>> and a static analyzer [*]. So regardless of whether it is the compiler or
>> the static analyzer that finds any UB, the only rational thing to do with
>> it is report it as a bug.
>>
>>
>> Peter Lawrence.
>>
>>
>> [* in fact that’s one of the primary reasons Apple adopted llvm, to use
>>   It as a base for static analysis]
>>
>>
>>
>> On Jul 21, 2017, at 10:03 PM, Mehdi AMINI <joker.eph at gmail.com> wrote:
>>
>>
>>
>> 2017-07-21 21:27 GMT-07:00 Peter Lawrence <peterl95124 at sbcglobal.net>:
>>
>>> Sean,
>>>      Let me re-phrase a couple words to make it perfectly clear
>>>
>>> On Jul 21, 2017, at 6:29 PM, Peter Lawrence <peterl95124 at sbcglobal.net>
>>> wrote:
>>>
>>> Sean,
>>>
>>> Dan Gohman’s “transform” changes a loop induction variable, but does not
>>> change the CFG,
>>>
>>> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering
>>> it.
>>>
>>> These are two totally different transforms.
>>>
>>>
>>>
>>> And even the analysis is different,
>>>
>>> The first is based on an *assumption* of non-UB (actually there is no
>>> analysis to perform)
>>>
>>>                        the *absence* of UB
>>>
>>>
>>> the second Is based on a *proof* of existence of UB (here typically some
>>> non-trivial analysis is required)
>>>
>>>                         the *presence* of UB
>>>
>>> These have, practically speaking, nothing in common.
>>>
>>>
>>>
>>> In particular, the first is an optimization, while the second is a
>>> transformation that
>>> fails to be an optimization because the opportunity for it happening in
>>> real world
>>> code that is expected to pass compilation without warnings, static
>>> analysis without
>>> warnings, and dynamic sanitizers without warnings, is zero.
>>>
>>> Or to put it another way, if llvm manages to find some UB that no
>>> analyzer or
>>> sanitizer does, and then deletes the UB, then the author of that part of
>>> llvm
>>> is in the wrong group, and belongs over in the analyzer and/or sanitizer
>>> group.
>>>
>>
>> I don't understand your claim, it does not match at all my understand of
>> what we managed to get on agreement on in the past.
>>
>> The second transformation (dead code elimination to simplify) is based on
>> the assumption that there is no UB.
>>
>> I.e. after inlining for example, the extra context of the calling
>> function allows us to deduce the value of some conditional branching in the
>> inline body based on the impossibility of one of the path *in the context
>> of this particular caller*.
>>
>> This does not mean that the program written by the programmer has any UB
>> inside.
>>
>> This is exactly the example that Hal gave.
>>
>> This can't be used to expose any meaningful information to the
>> programmer, because it would be full of false positive. Basically a program
>> could be clean of any static analyzer error, of any UBSAN error, and
>> totally UB-free, and still exhibit tons and tons of such issues.
>>
>> --
>> Mehdi
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/0286bd25/attachment.html>