[llvm] [AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (PR #112332)

Tue Oct 15 01:36:10 PDT 2024

================
@@ -442,17 +444,46 @@ Type *GCNTTIImpl::getMemcpyLoopLoweringType(
     return FixedVectorType::get(Type::getInt32Ty(Context), 2);
   }
 
-  // Global memory works best with 16-byte accesses. Private memory will also
-  // hit this, although they'll be decomposed.
-  return FixedVectorType::get(Type::getInt32Ty(Context), 4);
+  // Global memory works best with 16-byte accesses.
+  // If the operation has a fixed known length that is large enough, it is
+  // worthwhile to return an even wider type and let legalization lower it into
+  // multiple accesses, effectively unrolling the memcpy loop. Private memory
+  // also hits this, although accesses may be decomposed.
+  //
+  // Don't unroll if
+  // - Length is not a constant, since unrolling leads to worse performance for
+  //   length values that are smaller or slightly larger than the total size of
+  //   the type returned here. Mitigating that would require a more complex
+  //   lowering for variable-length memcpy and memmove.
+  // - the memory operations would be split further into byte-wise accesses
+  //   because of their (mis)alignment, since that would lead to a huge code
+  //   size increase.
+  unsigned I32EltsInVector = 4;
+  if (MemcpyLoopUnroll > 0 && isa<ConstantInt>(Length)) {
+    unsigned VectorSizeBytes = I32EltsInVector * 4;
+    unsigned VectorSizeBits = VectorSizeBytes * 8;
+    unsigned UnrolledVectorBytes = VectorSizeBytes * MemcpyLoopUnroll;
+    Align PartSrcAlign(commonAlignment(SrcAlign, UnrolledVectorBytes));
+    Align PartDestAlign(commonAlignment(DestAlign, UnrolledVectorBytes));
+
+    const SITargetLowering *TLI = this->getTLI();
+    bool SrcNotSplit = TLI->allowsMisalignedMemoryAccessesImpl(
----------------
arsenm wrote:

Does it really matter if it's unaligned? I'd think we would still be better off issuing all the split loads. At least partially 

https://github.com/llvm/llvm-project/pull/112332