[llvm-dev] ARM vectorized fp16 support

Sjoerd Meijer via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 4 23:18:50 PDT 2019

Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining fp16 holes not that long ago, so again hopefully a newer version will just work for you.

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Yizhi Liu via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 05 September 2019 06:52
To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] ARM vectorized fp16 support


I'm trying to compile half precision program for ARM, while it seems
LLVM fails to automatically generate fused-multiply-add instructions
for c += a * b. I'm wondering whether I did something wrong, if not,
is it a missing feature that will be supported later? (I know there're
fp16 FMLA intrinsics though)

Test programs and outputs,

$ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c
test_vfma_lane_f16:                     // @test_vfma_lane_f16
                fmla       v2.4s, v1.4s, v0.4s   // fp32 is GOOD
                mov       v0.16b, v2.16b
$ cat vfp32.c
#include <arm_neon.h>
float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float32x4_t c) {
  c += a * b;
  return c;

$ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp16.c
test_vfma_lane_f16:                     // @test_vfma_lane_f16
                fmul       v0.4h, v1.4h, v0.4h
                fadd       v0.4h, v0.4h, v2.4h  // fp16 does NOT use FMLA
$ cat vfp16.c
#include <arm_neon.h>
float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
  c += a * b;
  return c;

Yizhi Liu
LLVM Developers mailing list
llvm-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190905/1aa0235f/attachment.html>

More information about the llvm-dev mailing list