[PATCH] D136919: [X86][RFC] Change mangle name of __bf16 from u6__bf16 to DF16b
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Nov 4 13:43:21 PDT 2022
tra added a comment.
In D136919#3909044 <https://reviews.llvm.org/D136919#3909044>, @rjmccall wrote:
> If Steve's argument is true, that suggests that it would be a waste for NVPTX to directly support BF16 arithmetic, at least not the same way it would support `float` or `double`. (Providing operations like x86's VDPBF16PS — https://en.wikichip.org/wiki/x86/avx512_bf16 — that start from BF16 operands but perform their arithmetic in `float` is a different story.)
On one hand I agree that bfloat's low precision makes it somewhat problematic to use as is, w/o doing actual math in single/double floats.
On the other hand, the primary use case for bfloat is machine learning, where the apps are often less concerned about precision but do care a lot about the magnitude of the numbers and the raw number crunching performance. Typical use pattern observed in the wild is to do as much as possible in bf16 (or fp16, depending on accelerator in use) and use higher precision only for the operations that really need it.
While heavy-weight ops like dot-product, matmul, etc will continue to consume the bulk of the GPU cycles, there is a practical need for simple ops, too. A lot of performance benefits in ML apps are derived from fusing multiple simple kernels into one and that rarely maps into those accelerated instructions. We need to be able to do plain old add/sub/mul/cmp. If bf16 has higher throughput compared to fp32, that will provide a tangible benefit for large classes of ML applications.
My bet is that bf16 math will be used extensively once NVIDIA's H100 <https://reviews.llvm.org/H100> GPUs become widely available. We do have sufficient evidence that `bf16` works well enough on multiple generations of Google's TPUs and the trend is likely to continue with more platforms adopting it. Fun fact: NVIDIA is introducing even lower precision formats in its new GPU: FP8 <https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/#:~:text=NVIDIA%20Hopper%20FP8%20data%20format>
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136919/new/
https://reviews.llvm.org/D136919
More information about the cfe-commits
mailing list