[llvm-dev] GEP with a null pointer base

Peter Lawrence via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 7 13:40:17 PDT 2017

> On Jul 6, 2017, at 3:07 PM, Chris Lattner <clattner at nondot.org> wrote:
>> On Jul 6, 2017, at 2:05 PM, Peter Lawrence via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> On Jul 6, 2017, at 1:00 PM, via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>    So far, so good.  The problem is that while LLVM seems to consider
>>>>    the above IR to be valid, we officially do not allow dereferencing
>>>>    a pointer constructed in this way (if I’m reading the rules
>>>>    correctly).  Consequently, if this GEP ever gets close enough to a
>>>>    load using the pointer, InstCombine will eliminate the GEP and the
>>>>    load.
>> This is the part that confuses me, why would such code be eliminated.
>> If it is illegal then this should be a compilation failure,
> This is illegal code, and if we only cared about the C spec, we could at least warn about it if not reject it outright.
> That said, the purpose of clang is to build real code, and real code contains some amount of invalid constructs in important code bases.  We care about building a pragmatic compiler that gets the job done, so we sometimes “make things work” even though we don’t have to.  There are numerous patterns in old-style “offsetof” macros that do similar things.  Instead of fighting to make all the world’s code be theoretically ideal, it is better to just eat it and “do what they meant”.

          The issue the original poster brought up is that instead of a compiler 
that as you say “makes things work” and “gets the job done” we have a compiler
that intentionally deletes “undefined behavior”, on the assumption that since it 
is the users responsibility to avoid UB this code must be unreachable and 
is therefore safe to delete.

It seems like there are three things the compiler could do with undefined behavior
1)   let the code go through (perhaps with a warning)
2)   replace the code with a trap
3)   optimize the code as unreachable (no warning because we’re assuming this is the users intention)

It looks like 3 is the llvm default, but IMHO is the least desirable choice,
real world examples showing the benefit are practically non-existent,
and it can mask a real source code bug.

In spite of option 3 being (IMHO) the least desirable choice, considerable
resources are being devoted to implementing it, and it does not seem
to be being done according to good software engineering practice.

This optimization seems to fit the “compiler design pattern” of a separate
analysis and transform pass where “poison” is an attribute that gets forward
propagated through expressions and assignments until it reaches some
instruction that turns “poison” into “undefined behavior”, after which the
block containing the UB can be deleted.

Putting this analysis and transform into a separate pass means that the
LangRef and IR can be cleaned up, there is no reason to have “poison”
and “freeze” in the IR, nor have any other passes have to deal with them.

Some folks are saying damn the torpedoes full speed ahead on option 3
in its least software-engineering-friendly form, others are saying wait-a-minute
lets slow down take a deep breath and consider the big picture first.

Thoughts ?
Comments ?
Questions ?

Peter Lawrence.

> -Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170707/1d58acc0/attachment.html>

More information about the llvm-dev mailing list