[PATCH] D114964: [DAG] Create fptoui.sat from clamped fptoui

Thu Dec 2 13:26:32 PST 2021

dmgreen added a comment.

In D114964#3167836 <https://reviews.llvm.org/D114964#3167836>, @spatel wrote:

> This seems fine as an extension of the previous patch. 
> I haven't been following the progress in this area closely though. What prevents folding these patterns to the saturating intrinsics in IR?
> https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic

An fptoui.sat is more defined than a fptoui + umin. For the fptoui any out-of-range value produces poison - for the fptoui.sat the out of range values are defined to saturate. So the transform isn't reversible and on some architectures produces worse code.

More concretely a `iN fptoui.sat(X)` can be expanded into `fptoui(fmax(fmin(X, (float)(2^N)-1), 0))` (or something else with float compares and int selects if fmin/fmax are not available). Plus it needs to handle Nan for fptosi, making sure it becomes 0.

I would actually really like it to be done in IR. We would need to vectorize these if we can, and they are much simpler to vectorize if we already have the scalar instructions. But it needs to be a costed decision, not an inst-combine canonicalization decision. Any ideas of a good place to make that happen?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114964/new/

https://reviews.llvm.org/D114964