[PATCH] D34601: [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
David Kreitzer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 12 06:41:59 PDT 2017
DavidKreitzer added a comment.
Thanks for the changes, Michael, and thanks for following up on the perf issue in https://bugs.llvm.org/show_bug.cgi?id=33740.
I have a few additional comments, but this generally looks good. The simplification of the code in X86InterleavedAccessGroup::lowerIntoOptimizedSequence is great!
================
Comment at: lib/Target/X86/X86ISelLowering.h:1440
+ void createUnpackShuffleMask(MVT VT, SmallVectorImpl<T> &Mask, bool Lo,
+ bool Unary) {
+ assert(Mask.empty() && "Expected an empty shuffle mask vector");
----------------
Fix the indenting here.
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:73
SmallVectorImpl<Value *> &TrasposedVectors);
-
+ void transposeChar_32x4(ArrayRef<Instruction *> InputVectors,
+ SmallVectorImpl<Value *> &TrasposedVectors);
----------------
DavidKreitzer wrote:
> "transpose" is a poor name here. "interleave" would be better. Also, I would prefer "8bit" or "1byte" to "Char", e.g. interleave8bit_32x4.
>
> "transpose" works for the 4x4 case (and other NxN cases), because the shuffle sequence does a matrix transpose on the input vectors, and the same code can be used for interleaving and de-interleaving. To handle the 32x8 load case, we would need a different code sequence than what you are currently generating in transposeChar_32x4. Presumably, we would use deinterleave8bit_32x4 for that.
>
I see that you changed this to "deinterleave8bit_32x4" rather than "interleave8bit_32x4". Can you please explain why? This routine is taking 4 input vectors and merging their elements like this:
v0[0], v1[0], v2[0], v3[0], v0[1], v1[1], v2[1], v3[1], ...
Wouldn't you call that interleaving?
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:113
- if (!Subtarget.hasAVX() || ShuffleVecSize != ExpectedShuffleVecSize ||
- DL.getTypeSizeInBits(ShuffleEltTy) != 64 || Factor != 4)
+ // Currently 8 bit of 32 elements support store oration only.
+ if (DL.getTypeSizeInBits(ShuffleEltTy) == 8 && !isa<StoreInst>(Inst))
----------------
The old wording here was better, and you have a typo in "oration".
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:198
+void createUnpackShuffleMaskDType(MVT VT, SmallVectorImpl<T> &Mask, bool Lo,
+ bool Unary) {
+ assert(Mask.empty() && "Expected an empty shuffle mask vector");
----------------
Would it be possible to simplify the implementation of this routine by just calling createUnpackShuffleMask with an i16 type plus calling scaleShuffleMask with a scale of 2? (That would require you to move scaleShuffleMask into X86ISelLowering.h like you did with createUnpackShuffleMask.)
https://reviews.llvm.org/D34601
More information about the llvm-commits
mailing list