[PATCH] D34601: [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
michael zuckerman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 12 09:50:22 PDT 2017
m_zuckerman added inline comments.
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:73
SmallVectorImpl<Value *> &TrasposedVectors);
-
+ void transposeChar_32x4(ArrayRef<Instruction *> InputVectors,
+ SmallVectorImpl<Value *> &TrasposedVectors);
----------------
DavidKreitzer wrote:
> DavidKreitzer wrote:
> > "transpose" is a poor name here. "interleave" would be better. Also, I would prefer "8bit" or "1byte" to "Char", e.g. interleave8bit_32x4.
> >
> > "transpose" works for the 4x4 case (and other NxN cases), because the shuffle sequence does a matrix transpose on the input vectors, and the same code can be used for interleaving and de-interleaving. To handle the 32x8 load case, we would need a different code sequence than what you are currently generating in transposeChar_32x4. Presumably, we would use deinterleave8bit_32x4 for that.
> >
> I see that you changed this to "deinterleave8bit_32x4" rather than "interleave8bit_32x4". Can you please explain why? This routine is taking 4 input vectors and merging their elements like this:
>
> v0[0], v1[0], v2[0], v3[0], v0[1], v1[1], v2[1], v3[1], ...
>
> Wouldn't you call that interleaving?
You are right, I swapped the terminology.
https://reviews.llvm.org/D34601
More information about the llvm-commits
mailing list