[llvm] [AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (PR #112332)
Fabian Ritter via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 23 05:30:20 PDT 2024
================
@@ -75,6 +75,13 @@ static cl::opt<size_t> InlineMaxBB(
cl::desc("Maximum number of BBs allowed in a function after inlining"
" (compile time constraint)"));
+// This default unroll factor is based on microbenchmarks on gfx1030.
+static cl::opt<unsigned> MemcpyLoopUnroll(
+ "amdgpu-memcpy-loop-unroll",
+ cl::desc("Unroll factor (affecting 4x32-bit operations) to use for memory "
+ "operations when lowering memcpy as a loop, must be a power of 2"),
----------------
ritter-x2a wrote:
I just rebased this PR on top of trunk, including the merged #112707, removed the then obsolete StoreSize==AllocSize assertions, and added test checks where the assertions would be violated in 734289baec944c8ccf495f012bf58d1bff168f3b.
https://github.com/llvm/llvm-project/pull/112332
More information about the llvm-commits
mailing list