[PATCH] D59669: [x86] use movmsk when extracting multiple lanes of a vector compare (PR39665)
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 21 14:46:58 PDT 2019
RKSimon added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:34440
+ bool CanUseMOVMSKPD = NewSetccVT == MVT::v2i64 && Subtarget.hasSSE2();
+ bool CanUsePMOVMSKB = NewSetccVT == MVT::v16i8 && Subtarget.hasSSE2();
+ if (!(CanUseMOVMSKPS || CanUseMOVMSKPD || CanUsePMOVMSKB))
----------------
With a little care we should be able to do v8i16 as well.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:34480
+ }
+ return SDValue(ExtElt, 0); // ExtElt was replaced.
+}
----------------
Can any of the code from combineHorizontalPredicateResult or combineBitcastvxi1 be reused?
================
Comment at: llvm/test/CodeGen/X86/movmsk-cmp.ll:4926
+; SSE2-NEXT: shrl $3, %ecx
+; SSE2-NEXT: andl $4, %eax
+; SSE2-NEXT: shrl $2, %eax
----------------
Are these and superfluous if we just need the lsb?
================
Comment at: llvm/test/CodeGen/X86/movmsk-cmp.ll:5136
+; SSE2-NEXT: cmovel %edx, %eax
; SSE2-NEXT: retq
;
----------------
For the anyof/allof cases it'd be a lot better if we could merge the tests into a single compare - in c-ray that would allow us to merge the multiple cmp+jmp - the 2 separate jmps packed so close together is a known perf issues.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59669/new/
https://reviews.llvm.org/D59669
More information about the llvm-commits
mailing list