[PATCH] D13722: Scalarization for masked gather/scatter intrinsics
Elena Demikhovsky via llvm-commits
llvm-commits at lists.llvm.org
Sun Oct 18 04:51:22 PDT 2015
delena marked an inline comment as done.
delena added a comment.
I'll separate patches:
1. Remove "Consecutive" argument from isLegalMaskedLoad() / store
2. Alignments and constant masks for masked load / store scalarization procedure
3. Gather/Scatter scalarization procedure
================
Comment at: ../include/llvm/Analysis/TargetTransformInfo.h:580
@@ -572,1 +579,3 @@
virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;
+ virtual bool isLegalMaskedScatter(Type *DataType) = 0;
+ virtual bool isLegalMaskedGather(Type *DataType) = 0;
----------------
mkuper wrote:
> How is this different from isLegalMaskedStore() with Consecutive == 0?
You are right. I forgot about "Consecutive". I decided to remove "Consecutive" from load/store.
================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1123
@@ -1122,3 +1122,3 @@
Value *Ptr = CI->getArgOperand(0);
- Value *Src0 = CI->getArgOperand(3);
+ Value *Alignment = CI->getArgOperand(1);
Value *Mask = CI->getArgOperand(2);
----------------
mkuper wrote:
> Perhaps these changes belong in a separate patch from the addition of ScalarizeMaskedGather()?
I added alignment for gather/scatter. And here as well. I can put it as a separate patch.
================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1128
@@ -1128,1 +1127,3 @@
+ unsigned AlignVal = cast<ConstantInt>(Alignment)->getZExtValue();
+ VectorType *VecType = dyn_cast<VectorType>(CI->getType());
assert(VecType && "Unexpected return type of masked load intrinsic");
----------------
mkuper wrote:
> Why is this a dyn_cast, if the result is assumed to be non-null in line 1131? If this is just for the assert, I'd make this a cast, and make the assert
> assert(isa<VectorType>(CI->getType())).
The form is not so important. I use VecType later and this form is more convenient for me.
================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1155
@@ -1139,1 +1154,3 @@
+ // Adjust alignment for the scalar instruction.
+ AlignVal = std::max(AlignVal, VecType->getScalarSizeInBits());
// Bitcast %addr fron i8* to EltTy*
----------------
mkuper wrote:
> This is a bit odd.
> If AlignVal >= VecType->getScalarSizeInBits(), then this is a nop.
> So, let's say originally AlignVal < VecType->getScalarSizeInBits(). This means that you're raising the alignment for the store of the first element. Are you sure that's what you want?
Thanks!
I changed to "min". (That what I initially meant for). If masked operation has alignment 16 (it is a vector op), I can't put 16 to scalar. I reduce it scalar type.
================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1358
@@ +1357,3 @@
+static void ScalarizeMaskedGather(CallInst *CI) {
+ Value *Ptrs = CI->getArgOperand(0);
+ Value *Alignment = CI->getArgOperand(1);
----------------
mkuper wrote:
> This looks very similar to ScalarizeMaskedLoad, except for the all-ones case.
> Can they perhaps be combined?
It is similar but not the same. I also extract a pointer from the vector of pointers. I don't want to mix scatter and store, gather and load.
================
Comment at: ../lib/CodeGen/CodeGenPrepare.cpp:1387
@@ +1386,3 @@
+
+ if (IsAllOnesMask) {
+ for (unsigned Idx = 0; Idx < VectorWidth; ++Idx) {
----------------
mkuper wrote:
> Let's say the mask is a constant, but not an all-ones constant.
> We could just iterate over the bits of the mask, and place only the loads where the bit is actually 1, instead of creating all of the branchy code.
>
> Normally it wouldn't matter, since SimplifyCFG (I think) would clean it up - but I don't think there's a SimplifyCFG run between here and ISel. Does something else clean up the mess for the non-all-ones constant case?
Yes, this optimization is possible. For load/store as well. I'll add it.
================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:1173
@@ -1173,3 +1172,3 @@
return false;
if (ST->hasAVX512() || ST->hasAVX2())
return true;
----------------
mkuper wrote:
> In case you're going to touch this code anyway - isn't it enough to check hasAVX2() (That is, doesn't hasAVX512() imply hasAVX2() ?) )
This function should be changed anyway.
1. We have masked load/store on AVX, not only AVX2.
2. We have masked load/store for i16 and i8 vectors on SKX.
But I need to add CodeGen support for this.
I'll change to hasAVX2 meanwhile.
================
Comment at: ../lib/Target/X86/X86TargetTransformInfo.cpp:1190
@@ +1189,3 @@
+ // AVX-512 allows gather and scatter
+ return DataWidth >= 32 && ST->hasAVX512();
+}
----------------
mkuper wrote:
> Shouldn't there be an upper limit to the DataWidth too? (Or to the vector element count, for that matter?)
If the vector will be too wide, type legalizer will split it.
I don't know what will be if vector width is not power of 2, I can reject this case.
Repository:
rL LLVM
http://reviews.llvm.org/D13722
More information about the llvm-commits
mailing list