[llvm] [AMDGPU] tensor_{load_to/store_from}_lds => ..._d2 simplification (PR #171540)

Wed Dec 10 05:58:08 PST 2025

================
@@ -1737,6 +1737,26 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
     NewII->takeName(&II);
     return IC.replaceInstUsesWith(II, NewII);
   }
+  case Intrinsic::amdgcn_tensor_load_to_lds:
+  case Intrinsic::amdgcn_tensor_store_from_lds: {
+    Value *D2 = II.getArgOperand(2);
+    Value *D3 = II.getArgOperand(3);
+    // We know that not passing the second and third tensor DMA groups is
+    // equivalent to passing zeroes for those registers, so we rewrite to the
+    // shorter form here.
+    if (!match(D2, m_Zero()) || !match(D3, m_Zero()))
----------------
arsenm wrote:

Can you also do this for undef? 

https://github.com/llvm/llvm-project/pull/171540