[llvm] [NVPTX] Enhance `mul.wide` and `mad.wide` peepholes (PR #150477)

Fri Aug 15 05:12:24 PDT 2025

akuegel wrote:

@justinfargnoli 

It looks like that this part of the change is causing performance regressions for us:
"Implements (add (mul.wide a, b), c) -> (mad.wide a, b, c) in instruction selection."

I checked that if I remove these patterns the performance is recovered:

```
defm MAD_WIDE_U32 : MAD_WIDE<"u32", mul_wide_unsigned_oneuse, I64RT, I32RT>;
defm MAD_WIDE_S32 : MAD_WIDE<"s32", mul_wide_signed_oneuse, I64RT, I32RT>;
```

I am attaching the before.ptx and after.ptx, maybe it helps to figure out by looking at the generated sass why it may be slower?
[before.ptx.txt](https://github.com/user-attachments/files/21794739/before.ptx.txt)
[after.ptx.txt](https://github.com/user-attachments/files/21794740/after.ptx.txt)

https://github.com/llvm/llvm-project/pull/150477