[PATCH] D36960: [X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).

Mon Sep 4 02:06:01 PDT 2017

zvi added inline comments.

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:478
+
+  Value *TempVec = Builder.CreateShuffleVector(Vec[1], Vec[1], VPAlign3);
+  TransposedMatrix[0] = Builder.CreateShuffleVector(Vec[0], Vec[0], VPAlign2);
----------------
m_zuckerman wrote:
> zvi wrote:
> > The second vector argument should be an undef, and the shuffle mask should select only elements from the first vector operand.  This will eventually happen in InstCombine or DAGCombine, but why not avoid this compile-time overhead?
> I want to reuse the createAlignMask function. The function according to two vectors and not one vector and one undef.
You can pass an argument to DecodePALIGNRMask saying whether the resulting shuffle mask is for a single source vector or two vector sources. It just so happens that it will take the same value of AlignDirection.

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:349
+    for (int i = 0, LaneSize = VF / LaneCount; i != LaneSize; ++i) {
+      Mask.push_back((i * Stride) % (LaneSize) + (LaneSize) * Lane);
+    }
----------------
zvi wrote:
> Extra brackets
I meant these brackets: (LaneSize)

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:364
+//  For example {0,3,6,1,4,7,2,5} => {3,3,2}
+static void setGroupSize(MVT VT, ArrayRef<uint32_t> Mask,
+                              SmallVectorImpl<uint32_t> &SizeInfo) {
----------------
Please try to make this function more efficient. Given Stride and VF you can compute the offset of each first/last group member and compute the first/last of the successor group etc.

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:365
+static void setGroupSize(MVT VT, ArrayRef<uint32_t> Mask,
+                              SmallVectorImpl<uint32_t> &SizeInfo) {
+  int VectorSize = VT.getSizeInBits();
----------------
Please check identation

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:370
+  int Size[3] = {0, 0, 0};
+  for (int i = 0, Count = 0, GroupNumber = 0; i < VF / LaneCount; ++i) {
+    if (GroupNumber != Mask[i] % 3) {
----------------
for (int i = 0, Count = 0, GroupNumber = 0, e = VF / LaneCount; i != e; ++i)

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:375
+    }
+    Size[Count]++;
+  }
----------------
Size is not really needed as you can update SizeInfo on the fly

================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:395
+//  align to the "left" side)
+void DecodePALIGNRMask(MVT VT, unsigned Imm,
+                       SmallVectorImpl<uint32_t> &ShuffleMask,
----------------
static

================
Comment at: test/Transforms/InterleavedAccess/X86/interleavedLoad.ll:2
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt < %s -mtriple=x86_64-pc-linux -mattr=+avx2 -interleaved-access -S | FileCheck %s

----------------
Sorry I noticed it only now:
Can you update this file with a AVX512 run and rebase the patch?

https://reviews.llvm.org/D36960