[llvm] [AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (PR #112332)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 15 01:36:10 PDT 2024
================
@@ -442,17 +444,46 @@ Type *GCNTTIImpl::getMemcpyLoopLoweringType(
return FixedVectorType::get(Type::getInt32Ty(Context), 2);
}
- // Global memory works best with 16-byte accesses. Private memory will also
- // hit this, although they'll be decomposed.
- return FixedVectorType::get(Type::getInt32Ty(Context), 4);
+ // Global memory works best with 16-byte accesses.
+ // If the operation has a fixed known length that is large enough, it is
+ // worthwhile to return an even wider type and let legalization lower it into
+ // multiple accesses, effectively unrolling the memcpy loop. Private memory
+ // also hits this, although accesses may be decomposed.
+ //
+ // Don't unroll if
+ // - Length is not a constant, since unrolling leads to worse performance for
+ // length values that are smaller or slightly larger than the total size of
+ // the type returned here. Mitigating that would require a more complex
+ // lowering for variable-length memcpy and memmove.
+ // - the memory operations would be split further into byte-wise accesses
+ // because of their (mis)alignment, since that would lead to a huge code
+ // size increase.
+ unsigned I32EltsInVector = 4;
+ if (MemcpyLoopUnroll > 0 && isa<ConstantInt>(Length)) {
+ unsigned VectorSizeBytes = I32EltsInVector * 4;
+ unsigned VectorSizeBits = VectorSizeBytes * 8;
+ unsigned UnrolledVectorBytes = VectorSizeBytes * MemcpyLoopUnroll;
+ Align PartSrcAlign(commonAlignment(SrcAlign, UnrolledVectorBytes));
+ Align PartDestAlign(commonAlignment(DestAlign, UnrolledVectorBytes));
+
+ const SITargetLowering *TLI = this->getTLI();
+ bool SrcNotSplit = TLI->allowsMisalignedMemoryAccessesImpl(
----------------
arsenm wrote:
Does it really matter if it's unaligned? I'd think we would still be better off issuing all the split loads. At least partially
https://github.com/llvm/llvm-project/pull/112332
More information about the llvm-commits
mailing list