[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)

Nikolay Panchenko via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 18 10:48:31 PDT 2025


npanchen wrote:

> @Prince781 due to this change we have lots of runtime errors, i.e. accuracy mismatch which #149393 does not solve. If @bangtianliu observed this too, it's worth to revert that change.

The main difference I see is right after isel:
```
; good ir

%61:b32 = CVT_bf16x2_f32 killed %60:b32, killed %59:b32, 5
%62:b32 = CVT_bf16x2_f32 killed %58:b32, killed %57:b32, 5
...
ST_i32 %62:b32,...
ST_i32 %61:b32,...
...
```

```
; bad ir

%61:b64 = V2I32toI64 killed %59:b32, killed %60:b32
%62:b64 = V2I32toI64 killed %57:b32, killed %58:b32
...
ST_i64 %62:b64,...
ST_i64 %61:b64,...
```

i.e. after that change there were no f32->bf16 conversions generated

https://github.com/llvm/llvm-project/pull/126337


More information about the llvm-commits mailing list