[PATCH] Fold EXTRACT_VECTOR_ELT(BUILD_VECTOR(Elt[0], ...), CstX ) -> Elt[CstX]

Thu Apr 23 15:34:29 PDT 2015

The combine seems sound, but ARM definitely needs to learn about special constants before this can go in.


================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:11004
@@ +11003,3 @@
+  // Fold EXTRACT_VECTOR_ELT(BUILD_VECTOR(Elt[0], ...), CstX ) -> Elt[CstX]
+  if (InVec.getOpcode() == ISD::BUILD_VECTOR && ConstEltNo) {
+    auto Elt = InVec.getOperand(N->getConstantOperandVal(1));
----------------
Would it make sense to check that the BUILD_VECTOR .hasOneUse()?  


================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:11010
@@ +11009,3 @@
+    // when folding this case.
+    if(Elt.getValueType() == NVT)
+      return Elt;
----------------
space before (

================
Comment at: test/CodeGen/AArch64/fold-constants.ll:6-8
@@ -5,5 +5,2 @@
 ; CHECK:       movi d0, #0000000000000000
-; CHECK-NEXT:  umov w8, v0.b[2]
-; CHECK-NEXT:  sbfx w8, w8, #0, #1
-; CHECK-NEXT:  fmov s0, w8
 ; CHECK-NEXT:  fmov x0, d0
----------------
This looks fine to me, but we can do better, I think.  The code looks like it's just shuffling zeros around, so I'd expect:
    mov x0, xzr

================
Comment at: test/CodeGen/AArch64/fold-constants.ll:18-19
@@ -17,4 +14,4 @@
   %vset_lane = insertelement <4 x i16> undef, i16 %vgetq_lane285, i32 0
   %4 = bitcast <4 x i16> %vset_lane to <1 x i64>
   %vget_lane = extractelement <1 x i64> %4, i32 0
   ret i64 %vget_lane
----------------
This is the part where the combiners get confused.  We could catch this with:

    (v1i64 bitcast (v4i16 build_vector (i16 C), undef, undef, undef))
        ->
    (v1i64 bitcast (i64 zext/sext (i16 C)))

It's not clear to me that's better though.  This would be more legitimate:

    (i64 bitcast (v4i16 build_vector (i16 C), undef, undef, undef))
        ->
    (i64 zext/sext (i16 C))

But for this testcase, we'd also need:

    (i64 extract_element (v1i64 bitcast N), i32 0)
        ->
    (i64 bitcast N)

And that might be a whole 'nother mess on AArch64 (it might be very useful as well, I'm not sure).

I'll have a closer look if you don't beat me to it ;)

================
Comment at: test/CodeGen/ARM/vmov.ll:3
@@ -2,1 +2,3 @@
 
+; XFAIL: *
+
----------------
These failures are caused by the logic in LowerBUILD_VECTOR (recognizing splats) not firing anymore.  With the combine, the vector doesn't survive until lowering, so we're stuck with a magic f64 constant that happens to have the same bitpattern as a vector splat.

As you say, ARM probably should learn about recognizing special patterns in LowerConstantFP and whatnot.  That would let it materialize stuff like:

    double 0x0808080808080808

using MOVI instead of constant pools (judging by the failure, I don't think it does).  Feel free to file a PR and/or give it a shot;  I'll try to have a look myself.

================
Comment at: test/CodeGen/R600/ds_read2.ll:220-221
@@ -223,1 +219,4 @@
+; SI: ds_read2_b32
+; SI-NOT: ds_read_b32
+; SI-NOT: ds_read_b32
 ; SI: s_endpgm
----------------
A single -NOT is enough

http://reviews.llvm.org/D9094

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/