[llvm-dev] ARM vectorized fp16 support

Thu Sep 5 07:18:34 PDT 2019

Hello again!
I got confused by the "compile half precision program for ARM" and was assuming --target=arm because it wasn't in your compile commands but you're targeting AArch64! Sorry about that, and I didn't look careful enough at your assembly.... Anyway, it looks like you're right and we're missing an opportunity here!

Usually this is a simple missing pattern. I am not promising anything, but I will see if I can do this on the side.

Feel free to open a bug report.

Cheers,
Sjoerd.

________________________________
From: Yizhi Liu <javelinjs at gmail.com>
Sent: 05 September 2019 07:41
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com>
Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] ARM vectorized fp16 support

Thanks for reply. I was using LLVM 8.0. Let me try trunk and will let
you know if it works.

On Wed, Sep 4, 2019 at 11:19 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote:
>
> Hi,
> Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining fp16 holes not that long ago, so again hopefully a newer version will just work for you.
>
> Cheers,
> Sjoerd.
> ________________________________
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Yizhi Liu via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: 05 September 2019 06:52
> To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] ARM vectorized fp16 support
>
> Hi,
>
> I'm trying to compile half precision program for ARM, while it seems
> LLVM fails to automatically generate fused-multiply-add instructions
> for c += a * b. I'm wondering whether I did something wrong, if not,
> is it a missing feature that will be supported later? (I know there're
> fp16 FMLA intrinsics though)
>
> Test programs and outputs,
>
> $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c
> test_vfma_lane_f16:                     // @test_vfma_lane_f16
>                 fmla       v2.4s, v1.4s, v0.4s   // fp32 is GOOD
>                 mov       v0.16b, v2.16b
>                 ret
> $ cat vfp32.c
> #include <arm_neon.h>
> float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float32x4_t c) {
>   c += a * b;
>   return c;
> }
>
> $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp16.c
> test_vfma_lane_f16:                     // @test_vfma_lane_f16
>                 fmul       v0.4h, v1.4h, v0.4h
>                 fadd       v0.4h, v0.4h, v2.4h  // fp16 does NOT use FMLA
>                 ret
> $ cat vfp16.c
> #include <arm_neon.h>
> float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
>   c += a * b;
>   return c;
> }
>
> --
> Yizhi Liu
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

--
Yizhi Liu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190905/de7576f3/attachment.html>