[PATCH] D29690: [AVX512] Improve EXTRACT_VECTOR_ELT with variable index.
Igor Breger via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 9 04:38:11 PST 2017
igorb added inline comments.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26982
if (UnaryShuffle && (Depth >= 3 || HasVariableMask) && !MaskContainsZeros &&
- ((Subtarget.hasAVX2() &&
- (MaskVT == MVT::v8f32 || MaskVT == MVT::v8i32)) ||
- (Subtarget.hasAVX512() &&
- (MaskVT == MVT::v8f64 || MaskVT == MVT::v8i64 ||
- MaskVT == MVT::v16f32 || MaskVT == MVT::v16i32)) ||
- (Subtarget.hasBWI() && MaskVT == MVT::v32i16) ||
- (Subtarget.hasBWI() && Subtarget.hasVLX() && MaskVT == MVT::v16i16) ||
- (Subtarget.hasVBMI() && MaskVT == MVT::v64i8) ||
- (Subtarget.hasVBMI() && Subtarget.hasVLX() && MaskVT == MVT::v32i8))) {
+ isVPERMVsupported(MaskVT, Subtarget)) {
MVT VPermMaskSVT = MVT::getIntegerVT(MaskEltSizeInBits);
----------------
RKSimon wrote:
> Haven't you've lost the hasVLX() check by using isVPERMVsupported?
I added patterns to TD file to handle 128/256 bit vector in case NoVLX using 512bit.
I think the check here only to insure VPERMV can be used.
================
Comment at: test/CodeGen/X86/avx512-insert-extract.ll:1520
+; KNL-NEXT: vmovq %rax, %xmm1
+; KNL-NEXT: vpermps %zmm0, %zmm1, %zmm0
+; KNL-NEXT: ## kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
----------------
delena wrote:
> It should be vpermpd.
Thanks!
================
Comment at: test/CodeGen/X86/avx512-insert-extract.ll:1693
+; KNL-NEXT: ## kill: %EDI<def> %EDI<kill> %RDI<def>
+; KNL-NEXT: vmovaps %xmm0, -{{[0-9]+}}(%rsp)
+; KNL-NEXT: andl $7, %edi
----------------
delena wrote:
> I propose to extend v8i16 to v8i32 and use vpermd.
Thanks Elena,
I prefer to handle all cases that can be extend in separate path.
================
Comment at: test/CodeGen/X86/avx512-insert-extract.ll:1780
+; KNL: ## BB#0:
+; KNL-NEXT: movzbl %dil, %eax
+; KNL-NEXT: vmovd %eax, %xmm1
----------------
delena wrote:
> The result will be wrong. If you have '1' in MSB, the destination will be zeroed. You should mask the %index with 0xF:
> and $15, %edi
> vmovd %edi, %xmm
>
Max 5 bits are used , according to LLVM doc "If idx exceeds the length of val, the results are undefined."
Repository:
rL LLVM
https://reviews.llvm.org/D29690
More information about the llvm-commits
mailing list