[Mlir-commits] [mlir] [mlir][amdgpu] Add tensor load store operations (PR #172686)

Wed Dec 17 08:47:00 PST 2025

================
@@ -3306,6 +3327,30 @@ void mlir::populateAMDGPUTypeAndAttributeConversions(
     Type i32 = IntegerType::get(type.getContext(), 32);
     return typeConverter.convertType(VectorType::get(4, i32));
   });
+  typeConverter.addConversion(
+      [&](TDMDescriptorType type,
+          SmallVectorImpl<Type> &result) -> std::optional<LogicalResult> {
+        Type i32 = IntegerType::get(type.getContext(), 32);
+        Type v4i32 = typeConverter.convertType(VectorType::get(4, i32));
+        Type v8i32 = typeConverter.convertType(VectorType::get(8, i32));
+        llvm::append_values(result, v4i32, v8i32, v4i32, v4i32);
+        return success();
+      });
+
+  auto addUnrealizedCast = [](OpBuilder &builder, TypeRange types,
+                              ValueRange inputs,
+                              Location loc) -> SmallVector<Value> {
+    if (inputs.size() != 1)
+      return {};
+
+    if (!isa<TDMDescriptorType>(inputs[0].getType()))
----------------
amd-eochoalo wrote:

I'm writing some comments that will be added in the commit, but maybe it would also help to type it here for some discussion. 

The ROCM integration tests broke because I added a materialization cast that uses unrealized_conversion_cast.

```cxx
  auto addUnrealizedCast = [](OpBuilder &builder, TypeRange types,
                              ValueRange inputs,
                              Location loc) -> SmallVector<Value> {
    auto cast = UnrealizedConversionCastOp::create(builder, loc, types, inputs);
    return cast.getResults();
  };

  typeConverter.addTargetMaterialization(addUnrealizedCast);
```

This target materialization was intended only for `TDMDescriptorType`. Without it, we would get errors of the type:

```
error: unexpected error: failed to legalize unresolved materialization from ('!amdgpu.tdm_descriptor') to ('vector<4xi32>', 'vector<8xi32>', 'vector<4xi32>', 'vector<4xi32>') that remained live after conversion
  amdgpu.tensor_store_from_lds %desc : !amdgpu.tdm_descriptor
```

But this target materialization is applied too liberally and would be applied to other items like memrefs. (This is taken from the failed tests.)

```
# | <stdin>:11:12: error: LLVM Translation failed for operation: builtin.unrealized_conversion_cast
# |       %0 = builtin.unrealized_conversion_cast %arg2 : !llvm.ptr to !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
# |            ^
# | <stdin>:11:12: note: see current operation: %0 = "builtin.unrealized_conversion_cast"(%arg2) : (!llvm.ptr) -> !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
```

I originally thought that the need for this target materialization may be a bug in OneToNOpAdaptor since this explicit target materialization was not needed for the regular OpAdaptor. But I asked about this and became convinced that adding the explicit target materialization is ok https://discord.com/channels/636084430946959380/642426447167881246/1445463097262211203

So, in order to avoid adding unrealized_conversion_cast everywhere, we just make it more restrictive and specify its use only in the case of TDMDescriptorType.

I'm still not too sure why it is explicitly needed when it is not needed for TDMBaseType (which does not use OneToNOpAdaptor).

https://github.com/llvm/llvm-project/pull/172686