[Mlir-commits] [mlir] [mlir][bufferization] Use Type instead of Value in unknown conversion (PR #144658)

Tue Jul 1 01:18:03 PDT 2025

andrey-golubev wrote:

> > Generally, bufferization should be able to create a memref from a tensor without needing to know more than just a mlir::Type.
> 
> Is it true?

I would imagine so. I mean, at least this makes sense: you get a type in, you get a type out. It is a type conversion, not a value-to-type conversion.

> In [IREE](https://github.com/iree-org/iree/blob/63f625d428a31e6ccf2fd594e544c3bef659c63f/compiler/src/iree/compiler/Codegen/Common/IREEComprehensiveBufferizePass.cpp#L143-L164), we have special logic for constants. I don't remember all the details, my guess is that we'd like to use private memory for small constants. I can try to make our project happy, but the change itself looks off to me. We name the function as `unknownTypeConverterFn`, but you always pass tensor types. I was thinking if passing Value allows you doing custom tensor types better because you can define and use your own type system in your dialect.

Thus far what we've come up with Matthias is: TensorLike + BufferLike give us custom type support, while options serve the builtin tensor -> builtin memref conversion. I guess this makes sense (I haven't seen issues but I'm only in the middle of the process with these changes) - unknown type conversion is kind of a last-mile fallback (for builtins?). Supposedly, if you're inside a custom type already (via TensorLike), you wouldn't need it?

> 
> The lit test failure in IREE is that we always expect [identity layout for constants](https://github.com/iree-org/iree/blob/63f625d428a31e6ccf2fd594e544c3bef659c63f/compiler/src/iree/compiler/Codegen/Common/test/iree_comprehensive_bufferize.mlir#L1404-L1406). Without passing the value and check if it is constant, we'll create a memref type with fully dynamic layout, while the constant is known static identity layout. Is there a way to recover the behavior?
> 
> Original output:
> 
> ```mlir
>     %cst_0 = arith.constant dense<[1, 2, 3, 4, 5]> : tensor<5xi32>
>     %0 = bufferization.to_buffer %cst_0 : tensor<5xi32> to memref<5xi32>
> ```
> 
> With the change, we always create dynamic layout for constants:
> 
> ```mlir
>     %cst_0 = arith.constant dense<[1, 2, 3, 4, 5]> : tensor<5xi32>
>     %0 = bufferization.to_buffer %cst_0 : tensor<5xi32> to memref<5xi32, strided<[?], offset: ?>>
> ```

Honestly, I am completely lost in all these layout peculiarities upstream. Our downstream does it slightly different: we generally have strides only for "subview" like operations (e.g. in tiling) and all other IR assumes "dense" (if I may) buffers. But, actually, what we do for constants is we strip the strides manually (via [canonicalizer](https://github.com/openvinotoolkit/npu_compiler/blob/d5d219ee0ffa5c6a0e3af6b99675b94e28e25583/src/vpux_compiler/src/dialect/const/ops.cpp#L314) - because we have our own constant operation). Now that I think of this, perhaps, this is exactly your problem also? I mean maybe the issue is the *default* semantics? I plan to look at builtin tensor -> memref conversion as well and make sure tensor encoding gets correctly mapped to memref layout. Perhaps it makes sense to revisit what should be done w.r.t. dynamic layouts to solve this issue for both of us?

https://github.com/llvm/llvm-project/pull/144658