[Mlir-commits] [mlir] [mlir][tosa][tosa-to-linalg] tosa.cast: fix answer mismatch to cast f64/f32 max value to i64/i32 (PR #130116)

Fri Mar 7 13:47:13 PST 2025

Hanumanth04 wrote:

> A difference with tensorflow isn't enough reason to change the TOSA implementation. Are the new semantics mandated by the TOSA spec? It [says](https://www.mlplatform.org/tosa/tosa_spec.html#_cast)
> 
> ```
> Casting from floating-point to integer:
> ...
> - Result overflows must be saturated.
> ```
> 
> and the pseudo implementation is
> 
> ```
> out = truncate<out_t>(apply_clip_s<i32_t>(round_to_nearest_int(in), minimum_s<out_t>(), maximum_s<out_t>()));
> ```
> 
> which tells me that the current behavior of casting F64 max to i64 9223372036854775807 is correct.

My two cents to the discussion:
According to the C++ specification (please see the Floating-integral conversion section in the below link), behavior is undefined for the cases mentioned in the solution description. I understand that the TOSA specification went with saturation behavior in this case. However, when comparing translated output numeric, we generally compare it with PyTorch and TensorFlow output. So, probably making TOSA Cast specification comply with PyTorch and TensorFlow will help here. At least, I see how this can be beneficial while comparing `single` precision numerics. 

https://en.cppreference.com/w/cpp/language/implicit_conversion

https://github.com/llvm/llvm-project/pull/130116