[PATCH] D23068: LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments.

Tue Aug 2 13:26:56 PDT 2016

jlebar added a comment.

lgtm, but I can't speak to the correctness of the amdgpu changes, so I'd like arsenm to sign off on them.


================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:443
@@ +442,3 @@
+  // Until MVT is extended to handle this, simply check for the size and
+  // rely on the condition below: allow accesses is the size is a multiple of 4.
+  if((VT == MVT::Other) || (VT != MVT::Other && VT.getSizeInBits() > 1024)) {
----------------
if the size

================
Comment at: test/Transforms/LoadStoreVectorizer/X86/preserve-order32.ll:21
@@ -22,1 +20,3 @@
+  %buff.p = load i8*, i8** %tmp1, align 4
+  %buff.val = load i8, i8* %buff.p, align 4
   store i8 0, i8* %buff.p, align 8
----------------
asbirlea wrote:
> jlebar wrote:
> > Why the change on this line?  Maybe you don't need any align for this load?
> This is not necessary. I can remove it completely for these 2 lines with the same outcome, but the one below for address 0 is required. I haven't looked in detail into what are the x86 requirements here.
> I only changes it to 4 because it looked right considering we're using i32.
> I only changes it to 4 because it looked right considering we're using i32.

Eh, it's x86, it doesn't really care.  I'd either (a) change as little as possible, or (b) make the tests as simple as possible (removing unnecessary alignments, although e.g. if you align arr[0] to 8 bytes, saying that arr[1] is aligned to 4b doesn't seem so unreasonable).

Not a big deal, though.

================
Comment at: test/Transforms/LoadStoreVectorizer/X86/preserve-order64.ll:25
@@ -24,3 +24,3 @@
   %tmp0 = getelementptr inbounds %struct.buffer_t, %struct.buffer_t* %buff, i64 0, i32 0
-  %buff.int = load i64, i64* %tmp0, align 8
+  %buff.int = load i64, i64* %tmp0, align 16
   ret void
----------------
asbirlea wrote:
> jlebar wrote:
> > I thought x86 doesn't care about the alignment?  Or, is it a question of the "fast" param?
> Related to the comment in the 32bit version of this, this is required for x86, haven't investigated in detail why. The other alignment directives above (buff.p and buff.val) are not needed.
It sounds like x86 is requiring us to align the whole vector.  It's probably the "fast" thing.  That's fine.


https://reviews.llvm.org/D23068