[llvm] [AMDGPU] tensor_{load_to/store_from}_lds => ..._d2 simplification (PR #171540)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 10 05:58:08 PST 2025
================
@@ -1737,6 +1737,26 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
NewII->takeName(&II);
return IC.replaceInstUsesWith(II, NewII);
}
+ case Intrinsic::amdgcn_tensor_load_to_lds:
+ case Intrinsic::amdgcn_tensor_store_from_lds: {
+ Value *D2 = II.getArgOperand(2);
+ Value *D3 = II.getArgOperand(3);
+ // We know that not passing the second and third tensor DMA groups is
+ // equivalent to passing zeroes for those registers, so we rewrite to the
+ // shorter form here.
+ if (!match(D2, m_Zero()) || !match(D3, m_Zero()))
----------------
arsenm wrote:
Can you also do this for undef?
https://github.com/llvm/llvm-project/pull/171540
More information about the llvm-commits
mailing list