[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

Mon Jun 24 02:47:42 PDT 2024

================
@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const CombineInfo &CI,
       return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
     }
   case S_LOAD_IMM:
-    switch (Width) {
-    default:
-      return 0;
-    case 2:
-      return AMDGPU::S_LOAD_DWORDX2_IMM;
-    case 3:
-      return AMDGPU::S_LOAD_DWORDX3_IMM;
-    case 4:
-      return AMDGPU::S_LOAD_DWORDX4_IMM;
-    case 8:
-      return AMDGPU::S_LOAD_DWORDX8_IMM;
+    // For targets that support XNACK replay, use the constrained load opcode.
+    if (STI && STI->hasXnackReplay()) {
+      switch (Width) {
----------------
rampitec wrote:

> > currently the alignment is picked from the first MMO and that'd definitely be smaller than the natural align requirement for the new load
> 
> You don't know that - the alignment in the first MMO will be whatever alignment the compiler could deduce, which could be large, e.g. if the pointer used for the first load was known to have a large alignment.

Moreover, it can easily be as large as a page. In a case of scalar load and kernarg.

https://github.com/llvm/llvm-project/pull/96162