[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)
Princeton Ferro via llvm-commits
llvm-commits at lists.llvm.org
Sat Jun 7 02:29:45 PDT 2025
Prince781 wrote:
**Update:**
I've streamlined the PR and it's much smaller now. Many of the changes in the test cases are just register renames.
The structuring/destructuring `mov.b64`s may look ugly, but they're mostly NOPs at the SASS level because 64-bit registers in PTX are emulated with 2 32-bit registers in SASS. I prefer not to introduce too many changes to existing pre-SM100 PTX, and there are only a few instructions supporting `f32x2` currently, hence why I'm forcing breaking apart loads/stores of v2f32.
I have some ideas for cleaning up these `mov`s which I'd like to present in another PR, just to keep this one easy to review.
https://github.com/llvm/llvm-project/pull/126337
More information about the llvm-commits
mailing list