[PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 19 01:12:37 PDT 2015
RKSimon added inline comments.
================
Comment at: test/CodeGen/X86/vec-hadd-int-256.ll:14
@@ +13,2 @@
+ ret i64 %1
+}
----------------
ashahid wrote:
> RKSimon wrote:
> > This codegen is the same as for the test1_hsum_int_i64 <2x i64> version in vec-hadd-int-128.ll - something is going wrong. You probably should compare against codegen from a AVX2 target.
> With AVX2 the generated code differ as below.
>
> **Case V2i64**
> vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1]
> vpaddq %xmm1, %xmm0, %xmm0
> vmovq %xmm0, %rax
> retq
>
>
> **Case V4i64**
> vextracti128 $1, %ymm0, %xmm1
> vpaddq %ymm1, %ymm0, %ymm0
> vpermq $237, %ymm0, %ymm1 # ymm1 = ymm0[1,3,2,3]
> vpaddq %ymm1, %ymm0, %ymm0
> vmovq %xmm0, %rax
> vzeroupper
> retq
So yes, it appears to be something is wrong with the legalization. When you build for SSE you only get the hsum of the bottom <2 x i64>, when you build for AVX (which legalizes <4 x i64>) you get the hsum of the whole <2 x i64>.
http://reviews.llvm.org/D10964
More information about the llvm-commits
mailing list