[llvm-dev] GEP with a null pointer base

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Wed Jul 26 21:12:36 PDT 2017


On Wed, Jul 26, 2017 at 8:40 PM, Peter Lawrence <peterl95124 at sbcglobal.net>
wrote:

>
>
>
> Message: 5
> Date: Tue, 25 Jul 2017 00:12:35 -0700
> From: Sean Silva via llvm-dev <llvm-dev at lists.llvm.org>
> To: Peter Lawrence <peterl95124 at sbcglobal.net>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>, John Regehr
> <regehr at cs.utah.edu>
> Subject: Re: [llvm-dev] GEP with a null pointer base
>
> On Fri, Jul 21, 2017 at 6:29 PM, Peter Lawrence <peterl95124 at sbcglobal.net
> >
> wrote:
>
> Sean,
>
> Dan Gohman’s “transform” changes a loop induction variable, but does not
> change the CFG,
>
> Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it.
>
> These are two totally different transforms.
>
> And even the analysis is different,
>
> The first is based on an *assumption* of non-UB (actually there is no
> analysis to perform)
>
> the second Is based on a *proof* of existence of UB (here typically some
> non-trivial analysis is required)
>
> These have, practically speaking, nothing in common.
>
>
>
> Ah, I think I see your point of confusion now. They are actually the same
> thing, but one has a hidden step. You can consider the induction variable
> transformation to be something like this:
>
>
> ```
> a = add nsw i32 b, c
> ```
>
> 1. ==> widen `a` and expand the add into:
>
> ```
> if ("add i32 b, c" does not overflow) {
>  a = add nsw i64 b, c
> } else {
>  a = (cast to i64) add nsw i32 b, c
> }
> ```
>
> 2. ==> if the `else` branch executes, then we provably have UB.
>
> ```
> a = add nsw i64 b, c
> ```
>
>
> The last step is the same as in Hal's example where we delete a part of the
> CFG by proving that if we reach it there is guaranteed to be UB. In Hal's
> example, the frontend marked the shift as having UB if the shift amount
> exceeds the bitwidth. In this case, the frontend marked it as
> non-overflowing. In this case, the fact that it is guaranteed to overflow
> in the `else` branch has to be proven by looking at a dominating
> conditional check instead of being manifest from a peephole observation
> (after inlining/constant folding) as it is in Hal's example.
>
> Hopefully that makes sense. If not, could you indicate specifically which
> of transformations 1. and/or 2. you think is illegal and why?
>
> -- Sean Silva
>
>
>
> Sean,
>           Yes I can see your point, but in practice that’s not how it’s
> done,
> no one expands the “+nsw” into control flow and then applies Hal’s
> transform to it to get Dan’s transform.  Dan’s transform doesn’t alter
> the _original_ CFG and is done as part of induction-variable analysis
> and simplification. Hal’s does alter the _original_ CFG, and uses a totally
> different analysis as well as different transformation. The only thing they
> have in common is the assumption that the program is UB-free at runtime,
> but they have nothing in common in implementation.
>
> So I’m not saying your transforms 1 & 2 are illegal, I’m saying they are
> unnecessary.
>
>
> And we’re missing the point, which is this, why ever delete UB code
> which can cover up a bug, and more importantly, why ever make that
> the default, given the emphasis in software engineering of avoiding UB
> at the source level.  Why are we implementing both -fsanitize=undefined
> and deleting undefined as an optimization.
>


Did you ever manage to get those performance numbers on 32-bit code with
-fwrapv? In the other thread you volunteered to do that as a means of
demonstrating that at least the +nsw transformations (excluding induction
variable widening) were actually unnecessary for performance, contrary to
everybody else's beliefs (and supporting your own position).

-- Sean Silva


>
>
> Peter.
>
>
>
>
>
>
>
> Peter Lawrence.
>
>
>
> On Jul 21, 2017, at 5:00 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
>
> On Jul 21, 2017 3:24 PM, "Peter Lawrence" <peterl95124 at sbcglobal.net>
> wrote:
>
>
> On Jul 20, 2017, at 11:22 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Wed, Jul 19, 2017 at 10:17 AM Peter Lawrence via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Chandler,
>               The only thing David made clear that wasn’t already clear
> is that he believes UB to be “comparatively rare”, which is in agreement
> with what Hal already said which is that he does not expect deleting
> UB will be of benefit to for example SPEC benchmarks.
>
> Given that it is “comparatively rare”, why all the effort to delete it ?
>
> And why make deleting it the default rather than warning about it ?
>
>
>
> There seems to be some confusion/misunderstanding here. My best
> understanding is that when David said this:
>
> "The cases where the compiler can statically prove that undefined
> behaviour is present are comparatively rare."
>
> What he was referring to/describing was a contrast with the optimizations
> described prior to that.
>
> It's something like this:
>
> UB-based optimizations don't prove UB is present - they optimize on the
> assumption that it is not present due to some unproven (by the compiler,
> but assumed to be known by the developer) invariants in the program.
>
> Think about a simple case like array bounds - the compiler emits an
> unconditional load to the memory because it assumes the developer correctly
> validated the bounds or otherwise constructed so that out of bounds indexes
> never reach that piece of code. This is quite common - that UB is assumed
> to not happen, and the compiler optimizes on this fact.
>
> What is less common, is for the compiler to be able to (in reasonable
> time) prove that UB /does/ happen (in many cases this would require complex
> interprocedural analysis - the array is defined in one function, maybe with
> a complex dynamic bound, then passed to another function and indexed using
> a non-trivial dynamic expression... statically proving that to be true or
> false is complex/expensive and so basically not done by any compiler - so
> any cases that are caught by the compiler are relatively trivial ("oh, you
> declared a const null pointer value, then dereferenced it within the same
> function", etc) & so don't happen very often (because they're also fairly
> obvious to developers too))
>
> Does that help explain the difference/distinction being drawn here?
>
>
>
>
> Dave,
>          perhaps you missed these parts of the discussion
>
> Here is the definition, acknowledged by Hal, of what we’re doing
>
> 1. Sometimes there are abstraction penalties in C++ code
> 2. That can be optimized away after template instantiation, function
> inlining, etc
> 3. When they for example exhibit this pattern
> if (A) {
> stuff;
> } else {
> other stuff including “undefined behavior”;
> }
> 4. Where the compiler assumes “undefined behavior” doesn’t actually happen
> because
>  In the C language standard it is the users responsibility to avoid it
> 5. Therefore in this example the compiler can a) delete the else-clause
>   b) delete the if-cond, c) assume A is true and propagate that
> information
>
>
>
> And, here’s the optimization that according to Sean we’re using to delete
> UB
>
> [ … ]
>
>
> In other words, if we can prove  "when program statement A executes then
> program statement B is guaranteed to execute and have undefined behavior"
> then we can assume that program statement A is never executed.
>
> In particular, deleting program statement A is a correct transformation.
> Deleting program statement B itself is a special case of this (when A = B).
>
> And yes, this does include everything up to and including `main`,
> intraprocedurally and interprocedurally.
>
>
> [ … ]
>
>
> -- Sean Silva
>
>
>
> This is entirely a separate issue from what Dan Gohman did to optimize
> sign extension
> of i32 induction variables out of loops for LP64 target machines, where
> the optimization
> is justified based on “the assumption that UB does not happen”, and no
> actual UB
> exists either statically or dynamically.
>
>
> Sorry, the way I phrased this in terms of program statements may have made
> this unclear, but this is precisely the particular case A=B that I
> mentioned. In this case, A=B="the induction variable increment" and we use
> that to deduce that the statement will not execute and overflow, which is
> what justifies the widening.
>
> Notice that I mentioned that deleting code is only a particular case. In
> the general case we deduce that dynamically something simply does not
> happen, which is what we do in order to prove that induction variable
> widening is safe (overflow cannot happen).
>
> There is nothing special about induction variable widening with respect to
> UB. It is justified by applying the same principles as all other UB-related
> transformations.
>
>
> Briefly, there is only one axiom in the compiler writer's toolbox w.r.t.
> UB and that is "the input program does not execute UB". Everything else is
> derived from that by pure logical reasoning. Does that make sense? Can you
> see how all the descriptions I gave above are derivable from that axiom?
>
> The common application of this is that we can assume any program property
> P whatsoever (not just liveness, but variable ranges, etc.) if we can prove
> that the program would execute UB should that property P fail to hold.
>
>
> -- Sean Silva
>
>
> But when it comes to actual provable UB the plan is to delete it.
> On that there is no confusion, and there is no mis-understanding.
>
>
> Peter Lawrence.
>
>
>
>
>
> - Dave
>
>
> Peter
>
>
> On Jul 13, 2017, at 2:15 PM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
> On Thu, Jul 13, 2017 at 5:13 PM Peter Lawrence via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> David,
>          Here is the definition accepted by Hal of what we’re doing
>
> 1. Sometimes there are abstraction penalties in C++ code
> 2. That can be optimized away after template instantiation, function
>
> inlining, etc
>
> 3. When they for example exhibit this pattern
>      if (A) {
>              stuff;
>      } else {
>              other stuff including “undefined behavior”;
>      }
> 4. Where the compiler assumes “undefined behavior” doesn’t actually
>
> happen because
>
>   In the C language standard it is the users responsibility to avoid
>
> it
>
> 5. Therefore in this example the compiler can a) delete the else-clause
>    b) delete the if-cond, c) assume A is true and propagate that
>
> information
>
>
>
> We are actively deleting undefined behavior, and the question is why
> given that doing so potentially masks a real source code bug.
> At the very least deleting undefined behavior should not be the default.
>
>
> You are asserting this (again), but others have clearly stated that they
> disagree. David gave detailed and clear reasons why. Continuing to re-state
> positions is not productive.
>
> -Chandler
>
>
> _______________________________________________
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/7b08152b/attachment-0001.html>


More information about the llvm-dev mailing list