[PATCH] D59669: [x86] use movmsk when extracting multiple lanes of a vector compare (PR39665)

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 21 14:46:58 PDT 2019


RKSimon added inline comments.


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:34440
+  bool CanUseMOVMSKPD = NewSetccVT == MVT::v2i64 && Subtarget.hasSSE2();
+  bool CanUsePMOVMSKB = NewSetccVT == MVT::v16i8 && Subtarget.hasSSE2();
+  if (!(CanUseMOVMSKPS || CanUseMOVMSKPD || CanUsePMOVMSKB))
----------------
With a little care we should be able to do v8i16 as well.


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:34480
+  }
+  return SDValue(ExtElt, 0);  // ExtElt was replaced.
+}
----------------
Can any of the code from combineHorizontalPredicateResult or combineBitcastvxi1 be reused?


================
Comment at: llvm/test/CodeGen/X86/movmsk-cmp.ll:4926
+; SSE2-NEXT:    shrl $3, %ecx
+; SSE2-NEXT:    andl $4, %eax
+; SSE2-NEXT:    shrl $2, %eax
----------------
Are these and superfluous if we just need the lsb?


================
Comment at: llvm/test/CodeGen/X86/movmsk-cmp.ll:5136
+; SSE2-NEXT:    cmovel %edx, %eax
 ; SSE2-NEXT:    retq
 ;
----------------
For the anyof/allof cases it'd be a lot better if we could merge the tests into a single compare - in c-ray that would allow us to merge the multiple cmp+jmp - the 2 separate jmps packed so close together is a known perf issues.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59669/new/

https://reviews.llvm.org/D59669





More information about the llvm-commits mailing list