[PATCH] D13722: Scalarization for masked gather/scatter intrinsics

Sun Oct 18 04:51:22 PDT 2015

delena marked an inline comment as done.
delena added a comment.

I'll separate patches:

1. Remove "Consecutive" argument from isLegalMaskedLoad() / store
2. Alignments and constant masks for masked load / store scalarization procedure
3. Gather/Scatter scalarization procedure


================
Comment at: ../include/llvm/Analysis/TargetTransformInfo.h:580
@@ -572,1 +579,3 @@
   virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;
+  virtual bool isLegalMaskedScatter(Type *DataType) = 0;
+  virtual bool isLegalMaskedGather(Type *DataType) = 0;
----------------
mkuper wrote:
> How is this different from isLegalMaskedStore() with Consecutive == 0?
You are right. I forgot about "Consecutive". I decided to remove "Consecutive" from load/store.

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1123
@@ -1122,3 +1122,3 @@
   Value *Ptr  = CI->getArgOperand(0);
-  Value *Src0 = CI->getArgOperand(3);
+  Value *Alignment = CI->getArgOperand(1);
   Value *Mask = CI->getArgOperand(2);
----------------
mkuper wrote:
> Perhaps these changes belong in a separate patch from the addition of ScalarizeMaskedGather()?
I added alignment for gather/scatter. And here as well. I can put it as a separate patch.

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1128
@@ -1128,1 +1127,3 @@
+  unsigned AlignVal = cast<ConstantInt>(Alignment)->getZExtValue();
+  VectorType *VecType = dyn_cast<VectorType>(CI->getType());
   assert(VecType && "Unexpected return type of masked load intrinsic");
----------------
mkuper wrote:
> Why is this a dyn_cast, if the result is assumed to be non-null in line 1131? If this is just for the assert, I'd make this a cast, and make the assert 
> assert(isa<VectorType>(CI->getType())).
The form is not so important. I use VecType later and this form is more convenient for me.

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1155
@@ -1139,1 +1154,3 @@
+  // Adjust alignment for the scalar instruction.
+  AlignVal = std::max(AlignVal, VecType->getScalarSizeInBits());
   // Bitcast %addr fron i8* to EltTy*
----------------
mkuper wrote:
> This is a bit odd. 
> If AlignVal >= VecType->getScalarSizeInBits(), then this is a nop.
> So, let's say originally AlignVal < VecType->getScalarSizeInBits(). This means that you're raising the alignment for the store of the first element. Are you sure that's what you want?
Thanks!
I changed to "min". (That what I initially meant for). If masked operation has alignment 16 (it is a vector op), I can't put 16 to scalar. I reduce it scalar type.

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1358
@@ +1357,3 @@
+static void ScalarizeMaskedGather(CallInst *CI) {
+  Value *Ptrs = CI->getArgOperand(0);
+  Value *Alignment = CI->getArgOperand(1);
----------------
mkuper wrote:
> This looks very similar to ScalarizeMaskedLoad, except for the all-ones case.
> Can they perhaps be combined?
It is similar but not the same. I also extract a pointer from the vector of pointers. I don't want to mix scatter and store, gather and load.

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1387
@@ +1386,3 @@
+
+  if (IsAllOnesMask) {
+    for (unsigned Idx = 0; Idx < VectorWidth; ++Idx) {
----------------
mkuper wrote:
> Let's say the mask is a constant, but not an all-ones constant.
> We could just iterate over the bits of the mask, and place only the loads where the bit is actually 1, instead of creating all of the branchy code.
> 
> Normally it wouldn't matter, since SimplifyCFG (I think) would clean it up - but I don't think there's a SimplifyCFG run between here and ISel. Does something else clean up the mess for the non-all-ones constant case?
Yes, this optimization is possible. For load/store as well. I'll add it.

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:1173
@@ -1173,3 +1172,3 @@
     return false;
   if (ST->hasAVX512() || ST->hasAVX2())
     return true;
----------------
mkuper wrote:
> In case you're going to touch this code anyway - isn't it enough to check hasAVX2() (That is, doesn't hasAVX512() imply hasAVX2() ?) )
This function should be changed anyway.
1. We have masked load/store on AVX, not only AVX2.
2. We have masked load/store for i16 and i8 vectors on SKX.
But I need to add CodeGen support for this.
I'll change to hasAVX2 meanwhile.

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:1190
@@ +1189,3 @@
+  // AVX-512 allows gather and scatter
+  return DataWidth >= 32 && ST->hasAVX512();
+}
----------------
mkuper wrote:
> Shouldn't there be an upper limit to the DataWidth too? (Or to the vector element count, for that matter?)
If the vector will be too wide, type legalizer will split it.
I don't know what will be if vector width is not power of 2, I can reject this case.


Repository:
  rL LLVM

http://reviews.llvm.org/D13722