[PATCH] D34601: [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.

Wed Jul 12 09:50:22 PDT 2017

m_zuckerman added inline comments.

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:73
                      SmallVectorImpl<Value *> &TrasposedVectors);
-
+  void transposeChar_32x4(ArrayRef<Instruction *> InputVectors,
+                          SmallVectorImpl<Value *> &TrasposedVectors);
----------------
DavidKreitzer wrote:
> DavidKreitzer wrote:
> > "transpose" is a poor name here.  "interleave" would be better. Also, I would prefer "8bit" or "1byte" to "Char", e.g. interleave8bit_32x4.
> > 
> > "transpose" works for the 4x4 case (and other NxN cases), because the shuffle sequence does a matrix transpose on the input vectors, and the same code can be used for interleaving and de-interleaving. To handle the 32x8 load case, we would need a different code sequence than what you are currently generating in transposeChar_32x4. Presumably, we would use deinterleave8bit_32x4 for that.
> > 
> I see that you changed this to "deinterleave8bit_32x4" rather than "interleave8bit_32x4". Can you please explain why? This routine is taking 4 input vectors and merging their elements like this:
> 
> v0[0], v1[0], v2[0], v3[0], v0[1], v1[1], v2[1], v3[1], ...
> 
> Wouldn't you call that interleaving?
You are right, I swapped the terminology. 

https://reviews.llvm.org/D34601