[PATCH] D62969: [x86] narrow extract subvector of vector select
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 6 16:04:07 PDT 2019
spatel marked 2 inline comments as done.
spatel added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:43172
+ !collectConcatOps(Sel.getOperand(0).getNode(), CatOps))
+ return SDValue();
+
----------------
RKSimon wrote:
> Should we limit this to cases where all the uses of Sel.getOperand(0) are EXTRACT_SUBVECTORs?
This would be difficult because we peeked through bitcasts to get here (because that was required to match most of the cases in the regression tests). So we'd have to check uses of uses. A 1st attempt at that didn't go well - we lost almost all of the improvements.
================
Comment at: llvm/test/CodeGen/X86/horizontal-reduce-smax.ll:541
-; X64-AVX1-NEXT: vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
+; X64-AVX1-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
; X64-AVX1-NEXT: vmovq %xmm0, %rax
; X64-AVX1-NEXT: vzeroupper
----------------
RKSimon wrote:
> This case probably has a extract_element(extract_subvector(vselect())) pattern - should the existing DAGCombine code catch it?
I don't think we have generic combines for shrinking/scalarizing vselects. We would almost certainly need a TLI hook to decide when those are profitable.
The reason I've proposed this as an x86-specific fold is because it allows us to be more aggressive for these specific patterns. We know the 128-bit sequences are going to be cheaper than the 256-bit variants in almost all cases - even if it means we have to extract the true/false values or have extra uses of the select.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D62969/new/
https://reviews.llvm.org/D62969
More information about the llvm-commits
mailing list