[llvm] [NVPTX] Add support for nvvm.flo.[us] intrinsics (PR #114489)

Fri Nov 1 13:24:36 PDT 2024

AlexMaclean wrote:

> These intrinsics look like a solution in search of a problem. The instruction has been present in PTX ~ forever, but I've only learned about it today and I can't think of a single case where I would wish for something like `llvm.nvvm.flo`, nor did any LLVM/NVPTX users ever asked me for them.
> 
> So, my first question is -- do we really need them? If so, why?
> 
> If we do need them, is there any benefit in using the instructions? In case where there's no direct h/w support for the functionality, it may be better to expand to LLVM IR and let LLVM optimize that. To me it looks like a combination of logical ops and `llvm.ctlz` might just do the job.

Here is a small example of what the PTX for this instruction will be lowered to according to godbolt: https://godbolt.org/z/4crzY67oo

While this operation could be simulated with other existing generic instructions, it would be quite complex, especially for the signed case. The expansion could then be transformed by optimizations in various ways, making it difficult for NVPTX ISel or `ptxas` to fold back to this instruction.

While I agree it is not the most commonly used operation, there are cases where this intrinsic is right for the job and where using it can improve performance. A quick search of github did yield some cases where people have fallen back to inline asm: [https://github.com/search?type=code&auto_enroll=true&q=%28%22bfind.](https://github.com/search?type=code&auto_enroll=true&q=%28%22bfind.). There were also internal use cases which motivate the introduction of this intrinsic, though I'm not sure I can elaborate. 

https://github.com/llvm/llvm-project/pull/114489