[llvm-dev] GEP with a null pointer base

Fri Jul 21 17:21:18 PDT 2017

On Fri, Jul 21, 2017 at 5:00 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Jul 21, 2017 3:24 PM, "Peter Lawrence" <peterl95124 at sbcglobal.net>
> wrote:
>
>
> On Jul 20, 2017, at 11:22 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Wed, Jul 19, 2017 at 10:17 AM Peter Lawrence via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Chandler,
>>                The only thing David made clear that wasn’t already clear
>> is that he believes UB to be “comparatively rare”, which is in agreement
>> with what Hal already said which is that he does not expect deleting
>> UB will be of benefit to for example SPEC benchmarks.
>>
>> Given that it is “comparatively rare”, why all the effort to delete it ?
>>
> And why make deleting it the default rather than warning about it ?
>>
>
> There seems to be some confusion/misunderstanding here. My best
> understanding is that when David said this:
>
> "The cases where the compiler can statically prove that undefined
> behaviour is present are comparatively rare."
>
> What he was referring to/describing was a contrast with the optimizations
> described prior to that.
>
> It's something like this:
>
> UB-based optimizations don't prove UB is present - they optimize on the
> assumption that it is not present due to some unproven (by the compiler,
> but assumed to be known by the developer) invariants in the program.
>
> Think about a simple case like array bounds - the compiler emits an
> unconditional load to the memory because it assumes the developer correctly
> validated the bounds or otherwise constructed so that out of bounds indexes
> never reach that piece of code. This is quite common - that UB is assumed
> to not happen, and the compiler optimizes on this fact.
>
> What is less common, is for the compiler to be able to (in reasonable
> time) prove that UB /does/ happen (in many cases this would require complex
> interprocedural analysis - the array is defined in one function, maybe with
> a complex dynamic bound, then passed to another function and indexed using
> a non-trivial dynamic expression... statically proving that to be true or
> false is complex/expensive and so basically not done by any compiler - so
> any cases that are caught by the compiler are relatively trivial ("oh, you
> declared a const null pointer value, then dereferenced it within the same
> function", etc) & so don't happen very often (because they're also fairly
> obvious to developers too))
>
> Does that help explain the difference/distinction being drawn here?
>
>
>
>
> Dave,
>           perhaps you missed these parts of the discussion
>
> Here is the definition, acknowledged by Hal, of what we’re doing
>
> 1. Sometimes there are abstraction penalties in C++ code
> 2. That can be optimized away after template instantiation, function
> inlining, etc
> 3. When they for example exhibit this pattern
> if (A) {
> stuff;
> } else {
> other stuff including “undefined behavior”;
> }
> 4. Where the compiler assumes “undefined behavior” doesn’t actually happen
> because
>   In the C language standard it is the users responsibility to avoid it
> 5. Therefore in this example the compiler can a) delete the else-clause
>    b) delete the if-cond, c) assume A is true and propagate that
> information
>
>
>
> And, here’s the optimization that according to Sean we’re using to delete
> UB
>
> [ … ]
>
>
> In other words, if we can prove  "when program statement A executes then
> program statement B is guaranteed to execute and have undefined behavior"
> then we can assume that program statement A is never executed.
>
> In particular, deleting program statement A is a correct transformation.
> Deleting program statement B itself is a special case of this (when A = B).
>
> And yes, this does include everything up to and including `main`,
> intraprocedurally and interprocedurally.
>
>
> [ … ]
>
>
> -- Sean Silva
>
>
>
> This is entirely a separate issue from what Dan Gohman did to optimize
> sign extension
> of i32 induction variables out of loops for LP64 target machines, where
> the optimization
> is justified based on “the assumption that UB does not happen”, and no
> actual UB
> exists either statically or dynamically.
>
>
> Sorry, the way I phrased this in terms of program statements may have made
> this unclear, but this is precisely the particular case A=B that I
> mentioned. In this case, A=B="the induction variable increment"
>

This should have said A=B="the induction variable increments and
overflows". Describing it in terms of general program properties instead of
"program statements" is more mathematically precise in this case (talking
about coarse-granularity things like statements is restrictive for the
induction variable widening case where you want to talk about a statement
and something dynamically happening in that statement rather than just its
liveness).

-- Sean Silva

> and we use that to deduce that the statement will not execute and
> overflow, which is what justifies the widening.
>
> Notice that I mentioned that deleting code is only a particular case. In
> the general case we deduce that dynamically something simply does not
> happen, which is what we do in order to prove that induction variable
> widening is safe (overflow cannot happen).
>
> There is nothing special about induction variable widening with respect to
> UB. It is justified by applying the same principles as all other UB-related
> transformations.
>
>
> Briefly, there is only one axiom in the compiler writer's toolbox w.r.t.
> UB and that is "the input program does not execute UB". Everything else is
> derived from that by pure logical reasoning. Does that make sense? Can you
> see how all the descriptions I gave above are derivable from that axiom?
>
> The common application of this is that we can assume any program property
> P whatsoever (not just liveness, but variable ranges, etc.) if we can prove
> that the program would execute UB should that property P fail to hold.
>
>
> -- Sean Silva
>
>
> But when it comes to actual provable UB the plan is to delete it.
> On that there is no confusion, and there is no mis-understanding.
>
>
> Peter Lawrence.
>
>
>
>
>
> - Dave
>
>>
>> Peter
>>
>>
>> On Jul 13, 2017, at 2:15 PM, Chandler Carruth <chandlerc at gmail.com>
>> wrote:
>>
>> On Thu, Jul 13, 2017 at 5:13 PM Peter Lawrence via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> David,
>>>           Here is the definition accepted by Hal of what we’re doing
>>>
>>> > 1. Sometimes there are abstraction penalties in C++ code
>>> > 2. That can be optimized away after template instantiation, function
>>> inlining, etc
>>> > 3. When they for example exhibit this pattern
>>> >       if (A) {
>>> >               stuff;
>>> >       } else {
>>> >               other stuff including “undefined behavior”;
>>> >       }
>>> > 4. Where the compiler assumes “undefined behavior” doesn’t actually
>>> happen because
>>> >    In the C language standard it is the users responsibility to avoid
>>> it
>>> > 5. Therefore in this example the compiler can a) delete the else-clause
>>> >     b) delete the if-cond, c) assume A is true and propagate that
>>> information
>>>
>>>
>>>
>>> We are actively deleting undefined behavior, and the question is why
>>> given that doing so potentially masks a real source code bug.
>>> At the very least deleting undefined behavior should not be the default.
>>>
>>
>> You are asserting this (again), but others have clearly stated that they
>> disagree. David gave detailed and clear reasons why. Continuing to re-state
>> positions is not productive.
>>
>> -Chandler
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170721/89fd1e24/attachment.html>