[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)
Nikolay Panchenko via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 18 10:48:31 PDT 2025
npanchen wrote:
> @Prince781 due to this change we have lots of runtime errors, i.e. accuracy mismatch which #149393 does not solve. If @bangtianliu observed this too, it's worth to revert that change.
The main difference I see is right after isel:
```
; good ir
%61:b32 = CVT_bf16x2_f32 killed %60:b32, killed %59:b32, 5
%62:b32 = CVT_bf16x2_f32 killed %58:b32, killed %57:b32, 5
...
ST_i32 %62:b32,...
ST_i32 %61:b32,...
...
```
```
; bad ir
%61:b64 = V2I32toI64 killed %59:b32, killed %60:b32
%62:b64 = V2I32toI64 killed %57:b32, killed %58:b32
...
ST_i64 %62:b64,...
ST_i64 %61:b64,...
```
i.e. after that change there were no f32->bf16 conversions generated
https://github.com/llvm/llvm-project/pull/126337
More information about the llvm-commits
mailing list