[PATCH] D97459: [CodeGen] Fix issues with subvector intrinsic index types

Thu Feb 25 07:27:02 PST 2021

sdesmalen accepted this revision.
sdesmalen added a comment.
This revision is now accepted and ready to land.

That one test is either already broken, or I'm missing some important detail. In any case, the changes in this patch look good to me.

================
Comment at: llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll:104
+; CHECK-NEXT:    mov v1.b[11], w8
+; CHECK-NEXT:    mov v1.b[13], w8
+; CHECK-NEXT:    ext v1.16b, v1.16b, v1.16b, #8
----------------
frasercrmck wrote:
> frasercrmck wrote:
> > sdesmalen wrote:
> > > Probably not something that's caused by your patch, but I would have expected that elements `v1.b[12], v1.b[14] and v1.b[15]` to also be set to `-1`?
> > Perhaps. I'll admit I don't know much about AArch64 but that makes sense. My patch has taken us from:
> > 
> > ```
> >   t0: ch = EntryToken
> >   t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
> >   t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
> >   t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
> >   t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
> >   t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
> >   t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
> >   t29: v8i8 = extract_subvector t21, Constant:i64<0>
> >   t84: v16i8 = insert_subvector undef:v16i8, t29, Constant:i32<0>
> >   t86: i32 = extract_vector_elt t84, Constant:i64<6>
> >   // and other extracts from t84 of indices < 8
> >   // use t21 elsewhere
> > ```
> > 
> > to
> > 
> > ```
> >   t0: ch = EntryToken
> >   t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
> >   t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
> >   t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
> >   t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
> >   t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
> >   t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
> >   t86: i32 = extract_vector_elt t2, Constant:i64<6>
> >   // and other extracts from t2 of indices < 8
> >   // use t21 elsewhere
> > ```
> > 
> > Which I think is a good thing, since there was previously a redundant extract/insert going on, and a dependency on a vector whose elements different from `t2` weren't demanded.
> > 
> > Now as to *why* this is happening, there's this code in `SelectionDAG.cpp` when creating an `ISD::INSERT_SUBVECTOR`:
> > 
> > ```
> >     // If this is an insert of an extracted vector into an undef vector, we
> >     // can just use the input to the extract.
> >     if (N1.isUndef() && N2.getOpcode() == ISD::EXTRACT_SUBVECTOR &&
> >         N2.getOperand(1) == N3 && N2.getOperand(0).getValueType() == VT)
> >       return N2.getOperand(0);
> > ```
> > 
> > And I think it would be here `N2.getOperand(1) == N3` failed because you were mixing `i32` and `i64` types between insertion and extraction.
> Why the insert to element `12` is being removed, I'm not sure.
> 
> ```
> Replacing.2 t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
> 
> With: t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
> ```
> 
> That's pretty odd, but then again there's nothing in elements 8, 14, or 15. It seems to be an extension of what is happening to the other 3 elements before my patch:
> 
> ```
> Replacing.2 t25: v16i8 = insert_vector_elt t23, Constant:i32<-1>, Constant:i64<15>
> 
> With: t23: v16i8 = insert_vector_elt t21, Constant:i32<-1>, Constant:i64<14>
> ```
> 
> Since the test is doing `v8i8 and` of the low half of a vector with this upper half filled with `-1`s, presumably it can tell at some point that it's redundant and it can just use the low half. I don't know why it doesn't remove all of the `-1`s though.
If I visualise what SelectionDAG was/is trying to do here:
```reduce.and(<e0, e1, e2, e3, e4, e5, e6, e7, e8, ??, ??, ??, ??, ??, ??, ??>)
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
                                           must each be -1 for the AND reduction
=>
           <e0, e1, e2, e3, e4, e5, e6, e7, e8, -1, -1, -1, -1, -1, -1, -1>

=> this is somehow optimized to:
           <e0, e1, e2, e3, e4, e5, e6, e7, e8, -1, -1, -1, ??, -1, ??, ??>

it then extracts the Lo/Hi part of the vector,:
           Lo = <e0, e1, e2, e3, e4, e5, e6, e7>
           Hi = <e8, -1, -1, -1, ??, -1, ??, ??>

Performs a vector-AND:
       AND(<e0, e1, e2, e3, e4, e5, e6, e7>,
           <e8, -1, -1, -1, ??, -1, ??, ??>)
       <=>
           <aa, bb, cc, dd, ??, ff, ??, ??>

And then does an element-wise reduction:
  ...
    AND(??,       
      AND(dd,
        AND(cc,
          AND(aa, bb)
```
The `??` elements could still be anything, so if any of these elements is 0, then the reduction result is also 0. So unless I'm missing something, there seems to be something broken here, already before your patch.

The inserts of `-1` into element 14 and 15 were removed by https://github.com/llvm/llvm-project/commit/fc2b4a02b1a82c40ac1459cd15b9911ebfc78acc.

(the output from selectiondag before this patch:
```Vector/type-legalized selection DAG: %bb.0 'test_v9i8:'
SelectionDAG has 48 nodes:
  t0: ch = EntryToken
                t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
              t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
            t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
          t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
        t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
      t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
    t23: v16i8 = insert_vector_elt t21, Constant:i32<-1>, Constant:i64<14>
  t25: v16i8 = insert_vector_elt t23, Constant:i32<-1>, Constant:i64<15>
                  t56: i32 = extract_vector_elt t32, Constant:i64<0>
                  t57: i32 = extract_vector_elt t32, Constant:i64<1>
                t58: i32 = and t56, t57
                t59: i32 = extract_vector_elt t32, Constant:i64<2>
              t60: i32 = and t58, t59
              t61: i32 = extract_vector_elt t32, Constant:i64<3>
            t62: i32 = and t60, t61
            t63: i32 = extract_vector_elt t32, Constant:i64<4>
          t64: i32 = and t62, t63
          t65: i32 = extract_vector_elt t32, Constant:i64<5>
        t66: i32 = and t64, t65
        t67: i32 = extract_vector_elt t32, Constant:i64<6>
      t68: i32 = and t66, t67
      t69: i32 = extract_vector_elt t32, Constant:i64<7>
    t70: i32 = and t68, t69
  t8: ch,glue = CopyToReg t0, Register:i32 $w0, t70
    t29: v8i8 = extract_subvector t25, Constant:i64<0>
    t31: v8i8 = extract_subvector t25, Constant:i64<8>
  t32: v8i8 = and t29, t31
  t9: ch = AArch64ISD::RET_FLAG t8, Register:i32 $w0, t8:1

Optimized vector-legalized selection DAG: %bb.0 'test_v9i8:'
SelectionDAG has 44 nodes:
  t0: ch = EntryToken
            t2: v16i8,ch = CopyFromReg t0, Register:v16i8 %0
          t13: v16i8 = insert_vector_elt t2, Constant:i32<-1>, Constant:i64<9>
        t15: v16i8 = insert_vector_elt t13, Constant:i32<-1>, Constant:i64<10>
      t17: v16i8 = insert_vector_elt t15, Constant:i32<-1>, Constant:i64<11>
    t19: v16i8 = insert_vector_elt t17, Constant:i32<-1>, Constant:i64<12>
  t21: v16i8 = insert_vector_elt t19, Constant:i32<-1>, Constant:i64<13>
                  t56: i32 = extract_vector_elt t32, Constant:i64<0>
                  t57: i32 = extract_vector_elt t32, Constant:i64<1>
                t58: i32 = and t56, t57
                t59: i32 = extract_vector_elt t32, Constant:i64<2>
              t60: i32 = and t58, t59
              t61: i32 = extract_vector_elt t32, Constant:i64<3>
            t62: i32 = and t60, t61
            t75: i32 = extract_vector_elt t29, Constant:i64<4>
          t64: i32 = and t62, t75
          t65: i32 = extract_vector_elt t32, Constant:i64<5>
        t66: i32 = and t64, t65
        t72: i32 = extract_vector_elt t29, Constant:i64<6>
      t68: i32 = and t66, t72
      t71: i32 = extract_vector_elt t29, Constant:i64<7>
    t70: i32 = and t68, t71
  t8: ch,glue = CopyToReg t0, Register:i32 $w0, t70
  t29: v8i8 = extract_subvector t21, Constant:i64<0>
    t74: v8i8 = extract_subvector t21, Constant:i64<8>
  t32: v8i8 = and t29, t74
  t9: ch = AArch64ISD::RET_FLAG t8, Register:i32 $w0, t8:1
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97459/new/

https://reviews.llvm.org/D97459