[llvm-dev] GEP with a null pointer base

Chandler Carruth via llvm-dev llvm-dev at lists.llvm.org
Sun Jul 9 22:45:45 PDT 2017

On Sun, Jul 9, 2017 at 9:24 PM David Majnemer via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Sun, Jul 9, 2017 at 1:10 PM, Marcin SÅ‚owik via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>> Can we go back a little?
>> 1) Add a new transformation to InstCombine that will replace
>>> 'getelementptr i8, i8* null, <ty> %n' with 'inttoptr <ty> %n to i8*' when
>>> <ty> has the same size as a pointer for the target architecture.
>> What's the actual problem with this approach? I personally find it the
>> most compelling - it is well-defined (well, somewhat), front-end agnostic
>> (and assume some front ends may find this kind of pointer arithmetic to be
>> well-defined) and predictable.
>> I would even extend it to allow offsets of different types to be used,
>> with additional zero-extension when applicable.
> This would make correctness of a program dependent on running a particular
> optimization pass, something which is not sound from a semantics point of
> view (what if another pass sees the gep of null before InstCombine does,
> etc.).
> LLVM IR has semantics, properties which we are supposed to use to reason
> about what a particular piece of IR does. This proposed transformation,
> while legal, is not mandatory. Making it mandatory means more than adding
> one particular change to InstCombine: it means a change to the semantics of
> LLVM IR. This way we require that all passes, analysis, etc. treat gep null
> in an appropriate way.
> There are many reasons why such a semantic shift would be undesirable:
> - It opens up a pandora's box with regard to the semantics of
> transformations on GEPs when commuted and combined with other GEPs
> - It results in less expresivity: frontends should emit the IR that match
> the semantics of their source language. Constraining GEP semantics would
> constrain it for frontends which do not want or need this semantic shift.
> If the goal is to make (char*)0 + n work in clang, clang should be the
> bearer of that burden. It is not difficult to implement this in AST->IR
> lowering and has several benefits:
> - No change to GEP semantics which means that existing optimizations are
> sound.
> - Easy to explain why, when and how clang's behavior shifts with regards
> to particular source expressions and their lowering.

Just wanted to say that I emphatically agree with all of this.

(And with making the above craziness work in Clang as a pragmatic way to
support real code in the wild even if it is undesirable code in the wild.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170710/220d51c2/attachment.html>

More information about the llvm-dev mailing list