[llvm] [NVPTX] Prefer prmt.b32 over bfi.b32 (PR #110766)
Kevin McAfee via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 9 14:22:06 PDT 2024
kalxr wrote:
> Converting from v = prmt(d, prmt(c, prmt(a,b))) to v = prmt(prmt(c,d), prmt(a,b)) may squeeze a bit more performance here if GPU can do two leaf permutes in parallel as they are independent.
I agree that this would be better for performance, and it doesn't like like ptxas does this.
> Do we still want or need it?
Given that this change would include the above improvement, IMO it would be useful.
https://github.com/llvm/llvm-project/pull/110766
More information about the llvm-commits
mailing list