[llvm] [NVPTX] Prefer prmt.b32 over bfi.b32 (PR #110766)

Kevin McAfee via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 9 14:22:06 PDT 2024


kalxr wrote:

> Converting from v = prmt(d, prmt(c, prmt(a,b))) to v = prmt(prmt(c,d), prmt(a,b)) may squeeze a bit more performance here if GPU can do two leaf permutes in parallel as they are independent. 

I agree that this would be better for performance, and it doesn't like like ptxas does this.

> Do we still want or need it?

Given that this change would include the above improvement, IMO it would be useful.

https://github.com/llvm/llvm-project/pull/110766


More information about the llvm-commits mailing list