[llvm] [NVPTX] Enhance `mul.wide` and `mad.wide` peepholes (PR #150477)
Adrian Kuegel via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 20 05:02:29 PDT 2025
akuegel wrote:
> Hi @akuegel, thank you for reporting this and supplying the PTX diff! I took a look at the PTX diff, and it looks like what we'd expect from this PR. Perhaps there's a peephole/heuristic in `ptxas` that's no longer being triggered?
>
> Could you supply the CLI arguments you're passing to `ptxas` or the SASS for the before and after PTX?
@justinfargnoli Sorry for the late reply, I was away the last two days.
[cubin.zip](https://github.com/user-attachments/files/21893146/cubin.zip)
We are using libnvptx, with these compile options:
``` -arch=sm_90a --warn-on-spills```
I used the same options with ptxas, --version shows:
```
ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:21:21_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
```
The outputs are attached in a zip (files before.o and after.o).
https://github.com/llvm/llvm-project/pull/150477
More information about the llvm-commits
mailing list