[PATCH] D59669: [x86] use movmsk when extracting multiple lanes of a vector compare (PR39665)
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 21 15:09:52 PDT 2019
spatel marked 2 inline comments as done.
spatel added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:34480
+ }
+ return SDValue(ExtElt, 0); // ExtElt was replaced.
+}
----------------
RKSimon wrote:
> Can any of the code from combineHorizontalPredicateResult or combineBitcastvxi1 be reused?
I didn't see a way. This raises what might be a fundamental question about these kinds of patterns.
I was going for a general solution, but if we really only care about reductions or compare-of-compare, then we might be better off trying to add a glob of vector compare logic to SLP (maybe it's already there)?
Ie, we could try to form this in IR:
```
%cmp = fcmp ogt <2 x double> %x, %y
%e1 = extractelement <2 x i1> %cmp, i32 0
%e2 = extractelement <2 x i1> %cmp, i32 1
%u = and i1 %e1, %e2
=>
%cmp = fcmp ogt <2 x double> %x, %y
%bc = bitcast <2 x i1> %cmp to i2
%u = icmp eq i2 %bc, -1
```
================
Comment at: llvm/test/CodeGen/X86/movmsk-cmp.ll:5136
+; SSE2-NEXT: cmovel %edx, %eax
; SSE2-NEXT: retq
;
----------------
RKSimon wrote:
> For the anyof/allof cases it'd be a lot better if we could merge the tests into a single compare - in c-ray that would allow us to merge the multiple cmp+jmp - the 2 separate jmps packed so close together is a known perf issues.
Yes, I think this is an independent problem. Let me see if I can solve at least part of that, so it's not distracting here.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59669/new/
https://reviews.llvm.org/D59669
More information about the llvm-commits
mailing list