[llvm-dev] GEP with a null pointer base

Sun Jul 9 18:23:52 PDT 2017

On Sun, Jul 9, 2017 at 1:10 PM, Marcin Słowik via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Can we go back a little?
>
> 1) Add a new transformation to InstCombine that will replace
>> 'getelementptr i8, i8* null, <ty> %n' with 'inttoptr <ty> %n to i8*' when
>> <ty> has the same size as a pointer for the target architecture.
>
>
> What's the actual problem with this approach? I personally find it the
> most compelling - it is well-defined (well, somewhat), front-end agnostic
> (and assume some front ends may find this kind of pointer arithmetic to be
> well-defined) and predictable.
> I would even extend it to allow offsets of different types to be used,
> with additional zero-extension when applicable.
>

This would make correctness of a program dependent on running a particular
optimization pass, something which is not sound from a semantics point of
view (what if another pass sees the gep of null before InstCombine does,
etc.).

LLVM IR has semantics, properties which we are supposed to use to reason
about what a particular piece of IR does. This proposed transformation,
while legal, is not mandatory. Making it mandatory means more than adding
one particular change to InstCombine: it means a change to the semantics of
LLVM IR. This way we require that all passes, analysis, etc. treat gep null
in an appropriate way.

There are many reasons why such a semantic shift would be undesirable:
- It opens up a pandora's box with regard to the semantics of
transformations on GEPs when commuted and combined with other GEPs
- It results in less expresivity: frontends should emit the IR that match
the semantics of their source language. Constraining GEP semantics would
constrain it for frontends which do not want or need this semantic shift.

If the goal is to make (char*)0 + n work in clang, clang should be the
bearer of that burden. It is not difficult to implement this in AST->IR
lowering and has several benefits:
- No change to GEP semantics which means that existing optimizations are
sound.
- Easy to explain why, when and how clang's behavior shifts with regards to
particular source expressions and their lowering.

>
> Cheers,
> Marcin
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170709/2f6b2cb3/attachment.html>