[llvm] [NVPTX] designate fabs and fneg as free (PR #121513)

Tue Jan 7 16:08:24 PST 2025

================
@@ -261,6 +261,9 @@ class NVPTXTargetLowering : public TargetLowering {
     return true;
   }
 
+  bool isFAbsFree(EVT VT) const override { return true; }
+  bool isFNegFree(EVT VT) const override { return true; }
----------------
Artem-B wrote:

> I'm not sure this is the right rule of thumb.

All models are wrong, but some are still useful. :-) I'm all for refining mine.

> we should use the most expressive/specific ptx instruction even if we suspect it may just be syntactic sugar. 
> ptxas may have specific heuristics/peepholes which apply very unevenly and are dependent on which instruction is 
> used. Even if in theory doing the expansion in LLVM should not matter, in many cases it can substantially impact perf.

It can be argued both ways. I've seen both the cases that support your assertions, and the cases where ptxas did not do a particularly great job. Granted, both LLVM and ptxas change over time, so there will never be one true answer suitable for all use cases all the time.

If I have to choose between the code I can observe, and improve, vs some kind of black box that requires blind trust, I'm biased towards the former, unless there's a strong evidence that whatever that black box will do is objectively better (e.g. has h/w support for exactly this operation). Without that, it boils down to who can optimize things better LLVM or PTXAS. LLVM has the benefit of having access to more information about the code. PTXAS knows more about the hardware. Which one wins depends on the details. Most of the time both are good enough, so the result is a wash.

> Yes, it may not apply to all types equally, still I think it is simplest and clearest to prefer fabs and fneg in all cases due to readability and the possibility of unknown ptxas optimizations.

Readability is a cosmetic factor. Unlike high level languages approximately nobody cares about PTX readability. It's nice to have, but it's not something to lose performance for.

"unknown ptxas optimizations" are not a very strong argument, IMO. For generic operations like abs/neg, I doubt that ptxas or any other compiler will have some magic sauce which will make whole lot of a difference. In the absence of specific evidence of SASS level advantage, my preference is to handle it somewhere where we control the source code.

That said, I do not see much of a downside to this change either, so for the sake of saving time for both of us I'll stamp the change. 


https://github.com/llvm/llvm-project/pull/121513