[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

Mon Jun 24 02:02:06 PDT 2024

================
@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const CombineInfo &CI,
       return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
     }
   case S_LOAD_IMM:
-    switch (Width) {
-    default:
-      return 0;
-    case 2:
-      return AMDGPU::S_LOAD_DWORDX2_IMM;
-    case 3:
-      return AMDGPU::S_LOAD_DWORDX3_IMM;
-    case 4:
-      return AMDGPU::S_LOAD_DWORDX4_IMM;
-    case 8:
-      return AMDGPU::S_LOAD_DWORDX8_IMM;
+    // For targets that support XNACK replay, use the constrained load opcode.
+    if (STI && STI->hasXnackReplay()) {
+      switch (Width) {
----------------
cdevadas wrote:

> > currently the alignment is picked from the first MMO and that'd definitely be smaller than the natural align requirement for the new load
> 
> You don't know that - the alignment in the first MMO will be whatever alignment the compiler could deduce, which could be large, e.g. if the pointer used for the first load was known to have a large alignment.

Are you suggesting to check the alignment in the first MMO and see if it is still the preferred alignment for the merge-load? 
Use the _ec if the alignment is found to be smaller than the expected value.

https://github.com/llvm/llvm-project/pull/96162