[llvm] [NVPTX] Prefer prmt.b32 over bfi.b32 (PR #110766)
Justin Fargnoli via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 9 14:36:45 PDT 2024
justinfargnoli wrote:
> This implies that this patch is likely a no-op for that actual GPU code on the SASS level.
> Do we still want or need it?
CC @AlexMaclean, as he's voiced similar concerns offline.
My original intention in creating this PR was to implement what I thought you had wanted to do in [[NVPTX] Improve lowering of v4i8](https://github.com/llvm/llvm-project/commit/cbafb6f2f5c99474164dcc725820cbbeb2e02e14) based on [your comment](https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911).
But, as @kalxr has pointed out, this PR includes the `prmt(d, prmt(c, prmt(a,b))) --> prmt(prmt(c,d), prmt(a,b))` change that makes this more performant than the current approach.
I'm also planning on doing more optimization work for `prmt`. This change would allow this lowering to take advantage of those.
https://github.com/llvm/llvm-project/pull/110766
More information about the llvm-commits
mailing list