[llvm] [NVPTX] Enhance `mul.wide` and `mad.wide` peepholes (PR #150477)

Wed Aug 20 05:02:29 PDT 2025

akuegel wrote:

> Hi @akuegel, thank you for reporting this and supplying the PTX diff! I took a look at the PTX diff, and it looks like what we'd expect from this PR. Perhaps there's a peephole/heuristic in `ptxas` that's no longer being triggered?
> 
> Could you supply the CLI arguments you're passing to `ptxas` or the SASS for the before and after PTX?

@justinfargnoli  Sorry for the late reply, I was away the last two days.
[cubin.zip](https://github.com/user-attachments/files/21893146/cubin.zip)

We are using libnvptx, with these compile options:

``` -arch=sm_90a --warn-on-spills```

I used the same options with ptxas, --version shows:

```
ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:21:21_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0
```

The outputs are attached in a zip (files before.o and after.o).

https://github.com/llvm/llvm-project/pull/150477