[llvm] [NVPTX] Prefer prmt.b32 over bfi.b32 (PR #110766)

Justin Fargnoli via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 9 14:36:45 PDT 2024


justinfargnoli wrote:

> This implies that this patch is likely a no-op for that actual GPU code on the SASS level.
> Do we still want or need it?

CC @AlexMaclean, as he's voiced similar concerns offline. 

My original intention in creating this PR was to implement what I thought you had wanted to do in [[NVPTX] Improve lowering of v4i8](https://github.com/llvm/llvm-project/commit/cbafb6f2f5c99474164dcc725820cbbeb2e02e14) based on [your comment](https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911). 

But, as @kalxr has pointed out, this PR includes the `prmt(d, prmt(c, prmt(a,b))) --> prmt(prmt(c,d), prmt(a,b))` change that makes this more performant than the current approach. 

I'm also planning on doing more optimization work for `prmt`. This change would allow this lowering to take advantage of those.

https://github.com/llvm/llvm-project/pull/110766


More information about the llvm-commits mailing list