[all-commits] [llvm/llvm-project] 4263b2: [NVPTX] Expand EXTLOAD for v8f16 and v8bf16 (#72672)
peterbell10 via All-commits
all-commits at lists.llvm.org
Fri Nov 17 09:52:03 PST 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 4263b2ecf8c4b9b62094a731bb92c501197531b0
https://github.com/llvm/llvm-project/commit/4263b2ecf8c4b9b62094a731bb92c501197531b0
Author: peterbell10 <peterbell10 at live.co.uk>
Date: 2023-11-17 (Fri, 17 Nov 2023)
Changed paths:
M llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
M llvm/test/CodeGen/NVPTX/bf16-instructions.ll
M llvm/test/CodeGen/NVPTX/vector-loads.ll
Log Message:
-----------
[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 (#72672)
In openai/triton#2483 I've encountered a bug in the NVPTX codegen. Given
`load<8 x half>` followed by `fpext to <8 x float>` we get
```
ld.shared.v4.b16 {%f1, %f2, %f3, %f4}, [%r15+8];
ld.shared.v4.b16 {%f5, %f6, %f7, %f8}, [%r15];
```
Which loads float16 values into float registers without any conversion
and the result is simply garbage.
This PR brings `v8f16` and `v8bf16` into line with the other vector
types by expanding it to load + cvt.
cc @manman-ren @Artem-B @jlebar
More information about the All-commits
mailing list