[PATCH] D59669: [x86] use movmsk when extracting multiple lanes of a vector compare (PR39665)

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 21 15:09:52 PDT 2019


spatel marked 2 inline comments as done.
spatel added inline comments.


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:34480
+  }
+  return SDValue(ExtElt, 0);  // ExtElt was replaced.
+}
----------------
RKSimon wrote:
> Can any of the code from combineHorizontalPredicateResult or combineBitcastvxi1 be reused?
I didn't see a way. This raises what might be a fundamental question about these kinds of patterns. 

I was going for a general solution, but if we really only care about reductions or compare-of-compare, then we might be better off trying to add a glob of vector compare logic to SLP (maybe it's already there)?

Ie, we could try to form this in IR:

```
  %cmp = fcmp ogt <2 x double> %x, %y
  %e1 = extractelement <2 x i1> %cmp, i32 0
  %e2 = extractelement <2 x i1> %cmp, i32 1
  %u = and i1 %e1, %e2
  =>
  %cmp = fcmp ogt <2 x double> %x, %y
  %bc = bitcast <2 x i1> %cmp to i2
  %u = icmp eq i2 %bc, -1

```


================
Comment at: llvm/test/CodeGen/X86/movmsk-cmp.ll:5136
+; SSE2-NEXT:    cmovel %edx, %eax
 ; SSE2-NEXT:    retq
 ;
----------------
RKSimon wrote:
> For the anyof/allof cases it'd be a lot better if we could merge the tests into a single compare - in c-ray that would allow us to merge the multiple cmp+jmp - the 2 separate jmps packed so close together is a known perf issues.
Yes, I think this is an independent problem. Let me see if I can solve at least part of that, so it's not distracting here.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59669/new/

https://reviews.llvm.org/D59669





More information about the llvm-commits mailing list