[llvm] [NVPTX] Prefer prmt.b32 over bfi.b32 (PR #110766)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 4 16:24:14 PDT 2024


Artem-B wrote:

> > According to https://arxiv.org/pdf/2208.11174 BFI is much more expensive than PRMT which appears to take just 1 cycle on A100:
> 
> I think this refers to `bfi` in the general case. When the `c` and `d` operands are multiples of 8, `bfi` can be run as a `prmt`.

Ptxas actully does compile such `bfi` into `prmt`: https://godbolt.org/z/9xTrhG99v
This implies that this patch is likely a no-op for that actual GPU code on the SASS level. 

Do we still want or need it? 

https://github.com/llvm/llvm-project/pull/110766


More information about the llvm-commits mailing list