[PATCH] D13633: Type legalizer for masked gather/scatter intrinsics

Tue Dec 15 00:24:22 PST 2015

delena marked an inline comment as done.

================
Comment at: ../lib/Target/X86/X86ISelLowering.cpp:19849
@@ +19848,3 @@
+  MVT BitMaskVT = MVT::getVectorVT(MVT::i1, NumElts);
+  Mask = DAG.getNode(ISD::TRUNCATE, dl, BitMaskVT, Mask);
+
----------------
mbodart wrote:
> Minor mechanics question:  should we first check if Mask's element type is i1, and if so, avoid inserting the TRUNCATE?
getNode(ISD::TRUNCATE..) does this optimization.

================
Comment at: ../lib/Target/X86/X86ISelLowering.cpp:19946
@@ +19945,3 @@
+
+    // Minimal number of elements in Gather
+    NumElts = 8;
----------------
mbodart wrote:
> For a 32-bit target, we can gather 16 elements with a single gather.
> Though the comment says 8 is the minimum number of elements in
> a gather, which is true, the code also seems to be treating this as
> the maximum number.  How are 16-element gathers supported?
We always can sign-extend the indices. VGATHERQPS is supported in 32-bit mode.

================
Comment at: ../lib/Target/X86/X86ISelLowering.cpp:26980
@@ +26979,3 @@
+  SDLoc DL(N);
+  // The mask will be truncated anyway. The SIGN_EXTEND_INREG is redundant.
+  SDValue Mask = N->getOperand(2);
----------------
mbodart wrote:
> It is hard to interpret this comment, and thus the correctness of this routine,
> without any additional context.
> Why is a mask always truncated at this point?
> How do you know that it is truncated down to
> the element size of the SIGN_EXTEND_INREG's operand?
> Please add some supporting comments answering these kinds of questions.
I added more words here:
  // Gather and Scatter instructions use k-registers for masks. The type of
  // the masks is v*i1. So the mask will be truncated anyway.
  // The SIGN_EXTEND_INREG my be dropped.


================
Comment at: ../test/CodeGen/X86/masked_gather_scatter.ll:2-4
@@ -2,1 +1,5 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu  -mattr=+avx512f < %s | FileCheck %s --check-prefix=KNL_64
+; RUN: llc -mtriple=i386-unknown-linux-gnu  -mattr=+avx512f < %s | FileCheck %s --check-prefix=KNL_32
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu  -mattr=+avx512vl -mattr=+avx512dq < %s | FileCheck %s --check-prefix=SKX
 ; RUN: opt -mtriple=x86_64-apple-darwin -codegenprepare -mcpu=corei7-avx -S < %s | FileCheck %s -check-prefix=SCALAR
----------------
mbodart wrote:
> I would think we also want a test for i386 and avx512vl.
I added.


Repository:
  rL LLVM

http://reviews.llvm.org/D13633