[llvm] [AMDGPU][DAG] Enable ganging up of memcpy loads/stores for AMDGPU (PR #96185)

Thu Jun 20 07:16:15 PDT 2024

================
@@ -67,6 +67,9 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
   MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = ~0U;
   MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = ~0U;
 
+  // Enable ganging up loads and stores in the memcpy DAG lowering.
----------------
ritter-x2a wrote:

The current default is 0, disabling this transformation. The original author only enabled it for AArch64 and kept the previous behavior for the other targets.

In general, I don't think that it is obvious that this "optimization" (in a loose sense) improves performance for every target; it effectively limits the later stages of codegen in its choices.
We would need to benchmark the other targets to be sure that we don't decrease their performance if we want it to be enabled by default.

https://github.com/llvm/llvm-project/pull/96185