[PATCH] D13722: Scalarization for masked gather/scatter intrinsics

Thu Oct 15 05:29:11 PDT 2015

mkuper added a comment.

Hi Elena,
Some comments inline.


================
Comment at: ../include/llvm/Analysis/TargetTransformInfo.h:580
@@ -572,1 +579,3 @@
   virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;
+  virtual bool isLegalMaskedScatter(Type *DataType) = 0;
+  virtual bool isLegalMaskedGather(Type *DataType) = 0;
----------------
How is this different from isLegalMaskedStore() with Consecutive == 0?

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1123
@@ -1122,3 +1122,3 @@
   Value *Ptr  = CI->getArgOperand(0);
-  Value *Src0 = CI->getArgOperand(3);
+  Value *Alignment = CI->getArgOperand(1);
   Value *Mask = CI->getArgOperand(2);
----------------
Perhaps these changes belong in a separate patch from the addition of ScalarizeMaskedGather()?

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1128
@@ -1128,1 +1127,3 @@
+  unsigned AlignVal = cast<ConstantInt>(Alignment)->getZExtValue();
+  VectorType *VecType = dyn_cast<VectorType>(CI->getType());
   assert(VecType && "Unexpected return type of masked load intrinsic");
----------------
Why is this a dyn_cast, if the result is assumed to be non-null in line 1131? If this is just for the assert, I'd make this a cast, and make the assert 
assert(isa<VectorType>(CI->getType())).

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1155
@@ -1139,1 +1154,3 @@
+  // Adjust alignment for the scalar instruction.
+  AlignVal = std::max(AlignVal, VecType->getScalarSizeInBits());
   // Bitcast %addr fron i8* to EltTy*
----------------
This is a bit odd. 
If AlignVal >= VecType->getScalarSizeInBits(), then this is a nop.
So, let's say originally AlignVal < VecType->getScalarSizeInBits(). This means that you're raising the alignment for the store of the first element. Are you sure that's what you want?

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1252
@@ -1234,3 +1251,3 @@
 static void ScalarizeMaskedStore(CallInst *CI) {
-  Value *Ptr  = CI->getArgOperand(1);
   Value *Src = CI->getArgOperand(0);
+  Value *Ptr  = CI->getArgOperand(1);
----------------
Same comments apply as for the MaskedLoad.

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1358
@@ +1357,3 @@
+static void ScalarizeMaskedGather(CallInst *CI) {
+  Value *Ptrs = CI->getArgOperand(0);
+  Value *Alignment = CI->getArgOperand(1);
----------------
This looks very similar to ScalarizeMaskedLoad, except for the all-ones case.
Can they perhaps be combined?

================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1387
@@ +1386,3 @@
+
+  if (IsAllOnesMask) {
+    for (unsigned Idx = 0; Idx < VectorWidth; ++Idx) {
----------------
Let's say the mask is a constant, but not an all-ones constant.
We could just iterate over the bits of the mask, and place only the loads where the bit is actually 1, instead of creating all of the branchy code.

Normally it wouldn't matter, since SimplifyCFG (I think) would clean it up - but I don't think there's a SimplifyCFG run between here and ISel. Does something else clean up the mess for the non-all-ones constant case?

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:1173
@@ -1173,3 +1172,3 @@
     return false;
   if (ST->hasAVX512() || ST->hasAVX2())
     return true;
----------------
In case you're going to touch this code anyway - isn't it enough to check hasAVX2() (That is, doesn't hasAVX512() imply hasAVX2() ?) )

================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:1190
@@ +1189,3 @@
+  // AVX-512 allows gather and scatter
+  return DataWidth >= 32 && ST->hasAVX512();
+}
----------------
Shouldn't there be an upper limit to the DataWidth too? (Or to the vector element count, for that matter?)


Repository:
  rL LLVM

http://reviews.llvm.org/D13722