[llvm] [NVPTX] Use PRMT instruction to lower i16 bswap (PR #168968)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 20 16:55:08 PST 2025


Artem-B wrote:

> Here is the sass comparison.

Fascinating. New code is clearly an improvement, but it looks like we can (sometimes?) make it even better. Right now ptxas generates prmt to extend i16->i32, and only then swaps the bytes around. 

```
 PRMT R4, R4, 0x7710, RZ 
 PRMT R4, R4, 0x7701, RZ 
```

We only need one PRMT for that. Unfortunately, I see no easy way to avoid cvt.u32.u16. The only way that I managed to make ptxas generate only one PRMT is by using an extending load: https://godbolt.org/z/6PWx3hxda

Looks like this may need to be handled by ptxas. 


https://github.com/llvm/llvm-project/pull/168968


More information about the llvm-commits mailing list