[PATCH] D11439: [X86][SSE] Vectorize i64 ASHR operations
Simon Pilgrim
llvm-dev at redking.me.uk
Tue Jul 28 10:47:22 PDT 2015
RKSimon added a comment.
Thanks Quentin, if its not too much trouble please can you check the sibling patch to this one (http://reviews.llvm.org/D11327)?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17457
@@ +17456,3 @@
+ // M = SIGN_BIT u>> A
+ // R s>> a === ((R u>> A) ^ M) - M
+ if ((VT == MVT::v2i64 || (VT == MVT::v4i64 && Subtarget->hasInt256())) &&
----------------
qcolombet wrote:
> That wasn’t immediately clear to me that s>> and u>> referred to signed and unsigned shift.
> Use lshr and ashr instead, like in llvm ir (or the SD name variant if you prefer).
No problem.
================
Comment at: test/CodeGen/X86/vector-shift-ashr-128.ll:27
@@ -27,1 +26,3 @@
+; SSE2-NEXT: xorpd %xmm4, %xmm2
+; SSE2-NEXT: psubq %xmm4, %xmm2
; SSE2-NEXT: movdqa %xmm2, %xmm0
----------------
qcolombet wrote:
> Is this sequence actually better?
>
> I guess the GPR to vector and vector to GPR copies are quite expensive so that is the case.
> Just double checking.
Its very target dependent - on my old Penryn avoiding GPR/SSE transfers is by far the best option. Jaguar/Sandy Bridge don't care that much at the 64-bit integer level (it will probably come down to register pressure issues). Haswell has AVX2 so we can do per-lane v2i64 logical shifts where this patch really flies. When in 32-bit mode its always best to avoid trying to do (split) 64-bit shifts on GPR (see D11327).
Overall, the general per-lane ashr v2i64 lowering is the weakest improvement, but the splat and constant cases gain a lot more.
Repository:
rL LLVM
http://reviews.llvm.org/D11439
More information about the llvm-commits
mailing list