[llvm] [NVPTX] Prefer prmt.b32 over bfi.b32 (PR #110766)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 4 16:24:14 PDT 2024
Artem-B wrote:
> > According to https://arxiv.org/pdf/2208.11174 BFI is much more expensive than PRMT which appears to take just 1 cycle on A100:
>
> I think this refers to `bfi` in the general case. When the `c` and `d` operands are multiples of 8, `bfi` can be run as a `prmt`.
Ptxas actully does compile such `bfi` into `prmt`: https://godbolt.org/z/9xTrhG99v
This implies that this patch is likely a no-op for that actual GPU code on the SASS level.
Do we still want or need it?
https://github.com/llvm/llvm-project/pull/110766
More information about the llvm-commits
mailing list