[PATCH] D36058: [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF8 stride 4).

michael zuckerman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 22 00:58:54 PDT 2017


m_zuckerman added inline comments.


================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:124
+  if (ShuffleElemSize == 8 && isa<StoreInst>(Inst) &&
+      (WideInstSize == 256 || WideInstSize == 512 || WideInstSize == 1024))
+    return true;
----------------
Farhana wrote:
> Farhana wrote:
> > Please remove the extra space after "&&". 
> > 
> Do we also need to check DL.getTypeSizeInBits(ShuffleVecTy) == 256 here? 
Yes since in default mode, the compose function creates 4 xmm where only half of the wide is been used. I want that high bits will be left undef so I am checking 4*xmm = 256 in the next step will fill the undef part. 


================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:200
+
+static void createLowUnpackMask(int NumElements,
+                                SmallVectorImpl<uint32_t> &Mask,MVT VT) {
----------------
Farhana wrote:
> Is there anything special about this unpackMask? Why do we need a separate unpackmask generator? Why aren't you extending the createUnpackShuffleMask()? If createLowUnpackMask is unpacking in a different way than createUnpackShuffleMask() that distinction should be documented and clearly noted why we need a separate unpackMask-creator.  
The regular unpack function doesn't create * 2XN * as we need in this case and creates only half from one vector and another half from the second vector. In mine function, we use all the vector and the not only half. For the backend, the result is the same but the way they create is different.


https://reviews.llvm.org/D36058





More information about the llvm-commits mailing list