[PATCH] D54392: [DAGCombiner] look through bitcasts when trying to narrow vector binops

Mon Nov 19 18:33:49 PST 2018

efriedma added a comment.

It's probably okay to canonicalize the way you are, but you're hitting a missing pattern for AArch64.  Something like the following appears to work:

  def : Pat<(sub (extract_subvector (zext v8i8:$LHS), (i64 0)),
                 (extract_subvector (zext v8i8:$RHS), (i64 0))),
            (EXTRACT_SUBREG (USUBLv8i8_v8i16 v8i8:$LHS, v8i8:$RHS), dsub)>;

Of course, needs to be rewritten to to match all the relevant types and operations.  x86 doesn't really have those sort of operations, I guess?

(performAddSubLongCombine probably also should be extended, but that's not what you're seeing.)

================
Comment at: test/CodeGen/AArch64/arm64-ld1.ll:918-919
 ; CHECK-NEXT: ld1r.2s { [[ARG2:v[0-9]+]] }, [x1]
-; CHECK-NEXT: usubl.8h v[[RESREGNUM:[0-9]+]], [[ARG1]], [[ARG2]]
+; CHECK-NEXT: ushll.8h [[ARG1]], [[ARG1]], #0
+; CHECK-NEXT: ushll.8h [[ARG2]], [[ARG2]], #0
+; CHECK-NEXT: sub.4h v[[RESREGNUM:[0-9]+]], [[ARG1]], [[ARG2]]
----------------
spatel wrote:
> Side note for the ARM folks - I think this applies here?
> 
> ```
> UXTL{2} <Vd>.<Ta>, <Vn>.<Tb>
> is equivalent to
> USHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #0
> and is the preferred disassembly...
> ```
Not sure why the alias isn't getting automatically applied; please file a bug.

================
Comment at: test/CodeGen/X86/i64-mem-copy.ll:95
 ; X32AVX-NEXT:    vextracti128 $1, %ymm0, %xmm0
+; X32AVX-NEXT:    vpaddw %xmm1, %xmm0, %xmm0
 ; X32AVX-NEXT:    vmovq %xmm0, (%eax)
----------------
This appears to be one instruction more... but maybe worth avoid 256-bit operations on x86?

https://reviews.llvm.org/D54392