[PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.

Wed Aug 19 01:12:37 PDT 2015

RKSimon added inline comments.

================
Comment at: test/CodeGen/X86/vec-hadd-int-256.ll:14
@@ +13,2 @@
+  ret i64 %1
+}
----------------
ashahid wrote:
> RKSimon wrote:
> > This codegen is the same as for the test1_hsum_int_i64 <2x i64> version in vec-hadd-int-128.ll - something is going wrong. You probably should compare against codegen from a AVX2 target.
> With AVX2 the generated code differ as below.
> 
> **Case V2i64**
>         vpshufd $78, %xmm0, %xmm1       # xmm1 = xmm0[2,3,0,1]
>         vpaddq  %xmm1, %xmm0, %xmm0
>         vmovq   %xmm0, %rax
>         retq
> 
> 
> **Case V4i64**
>         vextracti128    $1, %ymm0, %xmm1
>         vpaddq  %ymm1, %ymm0, %ymm0
>         vpermq  $237, %ymm0, %ymm1      # ymm1 = ymm0[1,3,2,3]
>         vpaddq  %ymm1, %ymm0, %ymm0
>         vmovq   %xmm0, %rax
>         vzeroupper
>         retq
So yes, it appears to be something is wrong with the legalization. When you build for SSE you only get the hsum of the bottom <2 x i64>, when you build for AVX (which legalizes <4 x i64>) you get the hsum of the whole <2 x i64>.

http://reviews.llvm.org/D10964