[llvm-dev] GEP with a null pointer base

Peter Lawrence via llvm-dev llvm-dev at lists.llvm.org
Wed Jul 26 20:40:40 PDT 2017


> 
> 
> Message: 5
> Date: Tue, 25 Jul 2017 00:12:35 -0700
> From: Sean Silva via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> To: Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>>
> Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>, John Regehr
> 	<regehr at cs.utah.edu <mailto:regehr at cs.utah.edu>>
> Subject: Re: [llvm-dev] GEP with a null pointer base
> 
> On Fri, Jul 21, 2017 at 6:29 PM, Peter Lawrence <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>>
> wrote:
> 
>> Sean,
>> 
>> Dan Gohman’s “transform” changes a loop induction variable, but does not
>> change the CFG,
>> 
>> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it.
>> 
>> These are two totally different transforms.
>> 
>> And even the analysis is different,
>> 
>> The first is based on an *assumption* of non-UB (actually there is no
>> analysis to perform)
>> 
>> the second Is based on a *proof* of existence of UB (here typically some
>> non-trivial analysis is required)
>> 
>> These have, practically speaking, nothing in common.
>> 
> 
> 
> Ah, I think I see your point of confusion now. They are actually the same
> thing, but one has a hidden step. You can consider the induction variable
> transformation to be something like this:
> 
> 
> ```
> a = add nsw i32 b, c
> ```
> 
> 1. ==> widen `a` and expand the add into:
> 
> ```
> if ("add i32 b, c" does not overflow) {
>  a = add nsw i64 b, c
> } else {
>  a = (cast to i64) add nsw i32 b, c
> }
> ```
> 
> 2. ==> if the `else` branch executes, then we provably have UB.
> 
> ```
> a = add nsw i64 b, c
> ```
> 
> 
> The last step is the same as in Hal's example where we delete a part of the
> CFG by proving that if we reach it there is guaranteed to be UB. In Hal's
> example, the frontend marked the shift as having UB if the shift amount
> exceeds the bitwidth. In this case, the frontend marked it as
> non-overflowing. In this case, the fact that it is guaranteed to overflow
> in the `else` branch has to be proven by looking at a dominating
> conditional check instead of being manifest from a peephole observation
> (after inlining/constant folding) as it is in Hal's example.
> 
> Hopefully that makes sense. If not, could you indicate specifically which
> of transformations 1. and/or 2. you think is illegal and why?
> 
> -- Sean Silva
> 


Sean,
          Yes I can see your point, but in practice that’s not how it’s done,
no one expands the “+nsw” into control flow and then applies Hal’s
transform to it to get Dan’s transform.  Dan’s transform doesn’t alter
the _original_ CFG and is done as part of induction-variable analysis
and simplification. Hal’s does alter the _original_ CFG, and uses a totally
different analysis as well as different transformation. The only thing they
have in common is the assumption that the program is UB-free at runtime,
but they have nothing in common in implementation.

So I’m not saying your transforms 1 & 2 are illegal, I’m saying they are
unnecessary.


And we’re missing the point, which is this, why ever delete UB code
which can cover up a bug, and more importantly, why ever make that
the default, given the emphasis in software engineering of avoiding UB
at the source level.  Why are we implementing both -fsanitize=undefined
and deleting undefined as an optimization.


Peter.




> 
>> 
>> 
>> Peter Lawrence.
>> 
>> 
>> 
>> On Jul 21, 2017, at 5:00 PM, Sean Silva <chisophugis at gmail.com <mailto:chisophugis at gmail.com>> wrote:
>> 
>> 
>> 
>> On Jul 21, 2017 3:24 PM, "Peter Lawrence" <peterl95124 at sbcglobal.net <mailto:peterl95124 at sbcglobal.net>>
>> wrote:
>> 
>> 
>> On Jul 20, 2017, at 11:22 AM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>> 
>> 
>> 
>> On Wed, Jul 19, 2017 at 10:17 AM Peter Lawrence via llvm-dev <
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> Chandler,
>>>               The only thing David made clear that wasn’t already clear
>>> is that he believes UB to be “comparatively rare”, which is in agreement
>>> with what Hal already said which is that he does not expect deleting
>>> UB will be of benefit to for example SPEC benchmarks.
>>> 
>>> Given that it is “comparatively rare”, why all the effort to delete it ?
>>> 
>> And why make deleting it the default rather than warning about it ?
>>> 
>> 
>> There seems to be some confusion/misunderstanding here. My best
>> understanding is that when David said this:
>> 
>> "The cases where the compiler can statically prove that undefined
>> behaviour is present are comparatively rare."
>> 
>> What he was referring to/describing was a contrast with the optimizations
>> described prior to that.
>> 
>> It's something like this:
>> 
>> UB-based optimizations don't prove UB is present - they optimize on the
>> assumption that it is not present due to some unproven (by the compiler,
>> but assumed to be known by the developer) invariants in the program.
>> 
>> Think about a simple case like array bounds - the compiler emits an
>> unconditional load to the memory because it assumes the developer correctly
>> validated the bounds or otherwise constructed so that out of bounds indexes
>> never reach that piece of code. This is quite common - that UB is assumed
>> to not happen, and the compiler optimizes on this fact.
>> 
>> What is less common, is for the compiler to be able to (in reasonable
>> time) prove that UB /does/ happen (in many cases this would require complex
>> interprocedural analysis - the array is defined in one function, maybe with
>> a complex dynamic bound, then passed to another function and indexed using
>> a non-trivial dynamic expression... statically proving that to be true or
>> false is complex/expensive and so basically not done by any compiler - so
>> any cases that are caught by the compiler are relatively trivial ("oh, you
>> declared a const null pointer value, then dereferenced it within the same
>> function", etc) & so don't happen very often (because they're also fairly
>> obvious to developers too))
>> 
>> Does that help explain the difference/distinction being drawn here?
>> 
>> 
>> 
>> 
>> Dave,
>>          perhaps you missed these parts of the discussion
>> 
>> Here is the definition, acknowledged by Hal, of what we’re doing
>> 
>> 1. Sometimes there are abstraction penalties in C++ code
>> 2. That can be optimized away after template instantiation, function
>> inlining, etc
>> 3. When they for example exhibit this pattern
>> if (A) {
>> stuff;
>> } else {
>> other stuff including “undefined behavior”;
>> }
>> 4. Where the compiler assumes “undefined behavior” doesn’t actually happen
>> because
>>  In the C language standard it is the users responsibility to avoid it
>> 5. Therefore in this example the compiler can a) delete the else-clause
>>   b) delete the if-cond, c) assume A is true and propagate that
>> information
>> 
>> 
>> 
>> And, here’s the optimization that according to Sean we’re using to delete
>> UB
>> 
>> [ … ]
>> 
>> 
>> In other words, if we can prove  "when program statement A executes then
>> program statement B is guaranteed to execute and have undefined behavior"
>> then we can assume that program statement A is never executed.
>> 
>> In particular, deleting program statement A is a correct transformation.
>> Deleting program statement B itself is a special case of this (when A = B).
>> 
>> And yes, this does include everything up to and including `main`,
>> intraprocedurally and interprocedurally.
>> 
>> 
>> [ … ]
>> 
>> 
>> -- Sean Silva
>> 
>> 
>> 
>> This is entirely a separate issue from what Dan Gohman did to optimize
>> sign extension
>> of i32 induction variables out of loops for LP64 target machines, where
>> the optimization
>> is justified based on “the assumption that UB does not happen”, and no
>> actual UB
>> exists either statically or dynamically.
>> 
>> 
>> Sorry, the way I phrased this in terms of program statements may have made
>> this unclear, but this is precisely the particular case A=B that I
>> mentioned. In this case, A=B="the induction variable increment" and we use
>> that to deduce that the statement will not execute and overflow, which is
>> what justifies the widening.
>> 
>> Notice that I mentioned that deleting code is only a particular case. In
>> the general case we deduce that dynamically something simply does not
>> happen, which is what we do in order to prove that induction variable
>> widening is safe (overflow cannot happen).
>> 
>> There is nothing special about induction variable widening with respect to
>> UB. It is justified by applying the same principles as all other UB-related
>> transformations.
>> 
>> 
>> Briefly, there is only one axiom in the compiler writer's toolbox w.r.t.
>> UB and that is "the input program does not execute UB". Everything else is
>> derived from that by pure logical reasoning. Does that make sense? Can you
>> see how all the descriptions I gave above are derivable from that axiom?
>> 
>> The common application of this is that we can assume any program property
>> P whatsoever (not just liveness, but variable ranges, etc.) if we can prove
>> that the program would execute UB should that property P fail to hold.
>> 
>> 
>> -- Sean Silva
>> 
>> 
>> But when it comes to actual provable UB the plan is to delete it.
>> On that there is no confusion, and there is no mis-understanding.
>> 
>> 
>> Peter Lawrence.
>> 
>> 
>> 
>> 
>> 
>> - Dave
>> 
>>> 
>>> Peter
>>> 
>>> 
>>> On Jul 13, 2017, at 2:15 PM, Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>>
>>> wrote:
>>> 
>>> On Thu, Jul 13, 2017 at 5:13 PM Peter Lawrence via llvm-dev <
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>>> David,
>>>>          Here is the definition accepted by Hal of what we’re doing
>>>> 
>>>>> 1. Sometimes there are abstraction penalties in C++ code
>>>>> 2. That can be optimized away after template instantiation, function
>>>> inlining, etc
>>>>> 3. When they for example exhibit this pattern
>>>>>      if (A) {
>>>>>              stuff;
>>>>>      } else {
>>>>>              other stuff including “undefined behavior”;
>>>>>      }
>>>>> 4. Where the compiler assumes “undefined behavior” doesn’t actually
>>>> happen because
>>>>>   In the C language standard it is the users responsibility to avoid
>>>> it
>>>>> 5. Therefore in this example the compiler can a) delete the else-clause
>>>>>    b) delete the if-cond, c) assume A is true and propagate that
>>>> information
>>>> 
>>>> 
>>>> 
>>>> We are actively deleting undefined behavior, and the question is why
>>>> given that doing so potentially masks a real source code bug.
>>>> At the very least deleting undefined behavior should not be the default.
>>>> 
>>> 
>>> You are asserting this (again), but others have clearly stated that they
>>> disagree. David gave detailed and clear reasons why. Continuing to re-state
>>> positions is not productive.
>>> 
>>> -Chandler
>>> 
>>> 
>>> _______________________________________________

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/e692ea61/attachment-0001.html>


More information about the llvm-dev mailing list