[clang] [llvm] AMDGPU: Implement tensor load and store instructions for gfx1250 (PR #146636)

Tue Jul 1 23:56:34 PDT 2025

================
@@ -621,6 +621,32 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
     llvm::Function *F = CGM.getIntrinsic(IID, {LoadTy});
     return Builder.CreateCall(F, {Addr});
   }
+  case AMDGPU::BI__builtin_amdgcn_tensor_load_to_lds:
+  case AMDGPU::BI__builtin_amdgcn_tensor_load_to_lds_d2:
+  case AMDGPU::BI__builtin_amdgcn_tensor_store_from_lds:
+  case AMDGPU::BI__builtin_amdgcn_tensor_store_from_lds_d2: {
+    Intrinsic::ID IID;
+    switch (BuiltinID) {
+    case AMDGPU::BI__builtin_amdgcn_tensor_load_to_lds:
+      IID = Intrinsic::amdgcn_tensor_load_to_lds;
+      break;
+    case AMDGPU::BI__builtin_amdgcn_tensor_load_to_lds_d2:
+      IID = Intrinsic::amdgcn_tensor_load_to_lds_d2;
+      break;
+    case AMDGPU::BI__builtin_amdgcn_tensor_store_from_lds:
+      IID = Intrinsic::amdgcn_tensor_store_from_lds;
+      break;
+    case AMDGPU::BI__builtin_amdgcn_tensor_store_from_lds_d2:
+      IID = Intrinsic::amdgcn_tensor_store_from_lds_d2;
+      break;
+    }
+
+    SmallVector<Value *, 5> Args;
+    for (int i = 0, e = E->getNumArgs(); i != e; ++i)
+      Args.push_back(EmitScalarExpr(E->getArg(i)));
----------------
changpeng wrote:

We have to consider other instructions as in downstream branch. In addition, we should be consistent across the design.
so maybe we should make a complete change after the upstreaming. 

https://github.com/llvm/llvm-project/pull/146636