[PATCH] D38546: [ConstantFolding] Avoid assert when folding ptrtoint of vectorized GEP

Tue Oct 17 14:53:40 PDT 2017

bjope added inline comments.

================
Comment at: test/Analysis/ConstantFolding/cast-vector.ll:10
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret <2 x i16> ptrtoint (<2 x i32*> getelementptr ([10 x i32], [10 x i32]* null, <2 x i64> zeroinitializer, <2 x i64> <i64 5, i64 7>) to <2 x i16>)
+;
----------------
filcab wrote:
> bjope wrote:
> > filcab wrote:
> > > This is invalid IR (per langref). the `ptrtoint` constant expression only takes a pointer type argument:
> > > 
> > > > ptrtoint (CST to TYPE)
> > > > Convert a pointer typed constant to the corresponding integer constant. TYPE must be an integer type. CST must be of pointer type. The CST value is zero extended, truncated, or unchanged to make it fit in TYPE.
> > > 
> > > It seems to me that one of the following is the best fix:
> > >   - Fix whatever is creating a `ptrtoint` constant expression with a vector argument (make sure to also add an assert to make sure it doesn't happen again)
> > >   - Make `ptrtoint` of vectors a valid constant expression, which involves changing the langref and making sure these kinds of folds are valid.
> > > 
> > > 
> > > 
> > @filcab: Are you sure?
> > 
> > AFAIK this is valid according to verifiers. And it seems to work.
> > The examples for both ptrtoint and inttoptr in the "Instruction Reference" part of the langref includes examples with vectors. For ptrtoint the version of the langref that I've been looking at says:
> > 
> > > The ‘ptrtoint‘ instruction takes a value to cast, which must be a value of type pointer or a vector of pointers, and a type to cast it to ty2, which must be an integer or a vector of integers type.
> > 
> > I assume that you quoted the description from the "Constant Expressions" section of the langref. 
> > Are you saying that there is a limitaiton for constant expressions, and that the Instruction Reference part of the langref only is valid for non-constant expressions?
> > 
> > Besides, you say that I should "fix whatever is creating a ptrtoint constant expression with a vector argument". My patch is about bailing out from ConstantFoldCastInstruction without doing any folding.  Basically leaving the ptrtoint as it was in the input (which follows the example from the Instruction Reference part of the langref.... as the arguments isn't constant folded I guess it can't be seen as a constant expression...).
> The `ptrtoint` instruction in the input for your test is ok per langref.
> The `ptrtoint` constant expression on the output you expect is not.
> 
> If you look at your `CHECK` line, you're expecting an output like:
> `ret <2 x i16> ptrtoint (<2 x i32*> getelementptr ([10 x i32], [10 x i32]* null, <2 x i64> zeroinitializer, <2 x i64> <i64 5, i64 7>) to <2 x i16>)`
> 
> This is a `ret` of a `ptrtoint` constant expression which takes a `getelementptr` constant expression. On the input, you have instructions, which are different.  `ConstantFoldCastInstruction` is currently being called on malformed IR which includes that `ptrtoint` constant expression. The IR is already bad (according to langref), so it's not useful to have `ConstantFoldCastInstruction` instead of stopping whatever emitted that IR. (That's just one of the possible fixes)
> 
So you are saying that constant expressions has special limitations, and that the Instruction Reference part of the langref only is valid for non-constant expressions?
(note that vector types are `first class` types, and the Constant Expressions section of the langref do say "Constant expressions may be of any first class type")

Is the following also malformed IR then?
```
%bar = add <2 x i16> %foo, trunc (<2 x i32> <i32 78, i32 99> to <2 x i16>)
```
I think it is weird that using vector types in constant expressions with cast operations works just fine if it isn't allowed. Is it perhaps just the langref that is a little bit vague regarding this?

Nevertheless, I do not think that the IR has been "malformed" yet when I hit the assert. My fix is just about bailing out when analysing a `ptrtoint` expression,  very much in the same way as it already is done for `trunc` inside `ConstantFoldCastInstruction`.

Maybe it is a pity that the constant GEP expression isn't constant evaluated, reducing everything into a ret with a <2 x i16> with two literals. But that is kind of a different problem, and I'm not sure how common it is to have GEPs that are based on null pointers (only way to get a GEP that is a constant expression?), so maybe adding logic for that isn't worth the trouble.

As a comparison, if I change the test case to this (equivalent code but without the vector gep):

```
  %gep1 = getelementptr inbounds [10 x i32], [10 x i32]* null, i16 0, i16 5
  %gep2 = getelementptr inbounds [10 x i32], [10 x i32]* null, i16 0, i16 7
  %gepvec1 = insertelement <2 x i32*> undef, i32* %gep1, i32 0
  %gepvec2 = insertelement <2 x i32*> %gepvec1, i32* %gep2, i32 1
  %vec = ptrtoint <2 x i32*> %gepvec2 to <2 x i16>
  ret <2 x i16> %vec
```
then instsimplify will reduce it to

```
  ret <2 x i16> <i16 ptrtoint (i32* inttoptr (i64 20 to i32*) to i16), i16 ptrtoint (i32* inttoptr (i64 28 to i32*) to i16)>
```
while instcombine will reduce it to

```
  ret <2 x i16> <i16 20, i16 28>
```

So someone has bothered about evaluating constant GEPs that aren't vectors. But it also show that instsimplify doesn't reduce the ptrtoint and inttoptr fully, not even for the scalar case.

The IR produced in my test case:
```
ret <2 x i16> ptrtoint (<2 x i32*> getelementptr ([10 x i32], [10 x i32]* null, <2 x i64> zeroinitializer, <2 x i64> <i64 5, i64 7>) to <2 x i16>)
```
is accepted both by the verify pass, and llc. So instead of an assert, my original test case (C code) compiles into valid assembler. And I get the same assembler result after llc for all these variants of ret.

https://reviews.llvm.org/D38546