[PATCH] D34077: DAGCombine: Combine BUILD_VECTOR to TRUNCATE

Mon Jun 26 00:36:53 PDT 2017

zvi added a comment.

================
Comment at: test/CodeGen/ARM/vpadd.ll:221
 ; CHECK-NEXT:    vld1.64 {d16, d17}, [r0]
-; CHECK-NEXT:    vpadd.i8 d16, d16, d17
+; CHECK-NEXT:    vorr d18, d16, d16
+; CHECK-NEXT:    vorr d19, d17, d17
----------------
efriedma wrote:
> zvi wrote:
> > This may be a similar issue to https://reviews.llvm.org/D32993#inline-286079.
> > Will look into this.
> The problem here is pretty straightforward: on trunk, we have two shuffles (which each get lowered to one of the two outputs of ARMISD::VUZP).  With your patch, we have one ARMISD::VUZP, and one ISD::TRUNCATE. The ARM backend doesn't try to reason about this equivalence at all.
> 
> (This also eventually blocks the VUZP+VADD->VPADD combine, but that isn't really the important part.)
After looking into this some more (I apologize for the delayed response - got held up with other projects):
This specific case is the simplest and can be fixed by generalizing the combine.
But, unfortunately, cases such as addCombineToVPADDLq_s8 below seem much more difficult. By the time VZUP is matched the DAG explodes to:
   t0: ch = EntryToken
     t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
   t55: v2f64,ch = load<LD16[%cbcr](align=8)> t0, t2, undef:i32
   t56: v16i8 = bitcast t55
               t26: v8i16 = bitcast t56
             t51: v8i16 = ARMISD::VSHL t26, Constant:i32<8>
                 t46: v4i32 = ARMISD::VMOVIMM TargetConstant:i32<0>
               t47: v8i16 = bitcast t46
               t58: v8i16 = ARMISD::VMOVIMM TargetConstant:i32<2056>
             t48: v8i16 = sub t47, t58
           t50: v8i16 = llvm.arm.neon.vshifts Constant:i32<693>, t51, t48
               t37: v8i8 = extract_subvector t56, Constant:i32<0>
               t36: v8i8 = extract_subvector t56, Constant:i32<8>
             t54: v8i8,v8i8 = ARMISD::VUZP t37, t36
           t29: v8i16 = sign_extend t54:1
         t30: v8i16 = add t50, t29
       t52: v2f64 = bitcast t30
       t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1
     t53: ch = store<ST16[%X](align=8)> t55:1, t52, t4, undef:i32
   t32: ch = ARMISD::RET_FLAG t53

So i don't see any prospect for enabling this combine for ARM without some major work on the ARM backend.

https://reviews.llvm.org/D34077