[llvm] [AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (PR #112332)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 16 05:17:29 PDT 2024
================
@@ -75,6 +75,13 @@ static cl::opt<size_t> InlineMaxBB(
cl::desc("Maximum number of BBs allowed in a function after inlining"
" (compile time constraint)"));
+// This default unroll factor is based on microbenchmarks on gfx1030.
+static cl::opt<unsigned> MemcpyLoopUnroll(
+ "amdgpu-memcpy-loop-unroll",
+ cl::desc("Unroll factor (affecting 4x32-bit operations) to use for memory "
+ "operations when lowering memcpy as a loop, must be a power of 2"),
----------------
arsenm wrote:
So the GEPs are used incorrectly. You can always do the indexing in byte units. You don't need to preserve the type this way
https://github.com/llvm/llvm-project/pull/112332
More information about the llvm-commits
mailing list