[LLVMdev] Question about LLVM NEON intrinsics

Fri Sep 21 02:54:41 PDT 2012

On Fri, Sep 21, 2012 at 1:28 AM, Sebastien DELDON-GNB
<sebastien.deldon at st.com> wrote:
> Hi all,
>
> I would like to know if LLVM Neon intrinsics are designed to support only 'Legal' types for NEON units.
> Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll code:
>
>
> ; ModuleID = 'vmax.ll'
> target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
> target triple = "armv7-none-linux-androideabi"
>
> define void @vmaxf32(<4 x float> *%C, <4 x float>* %A, <4 x float>* %B) nounwind {
>     %tmp1 = load <4 x float>* %A
>     %tmp2 = load <4 x float>* %B
>     %tmp3 = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float> %tmp1, <4 x float> %tmp2)
>     store <4 x float> %tmp3, <4 x float>* %C
>     ret void
> }
>
> declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>, <4 x float>) nounwind readnone
>
> I've got following code generated:
>
> ...
> vmaxf32:                                @ @vmaxf32
> @ BB#0:
>         vld1.64 {d16, d17}, [r2]
>         vld1.64 {d18, d19}, [r1]
>         vmax.f32        q8, q9, q8
>         vst1.64 {d16, d17}, [r0]
>         bx      lr
> ...
>
> Now if use <16 x float> vectors instead of <4 x float>:
>
> define void @vmaxf32(<16 x float> *%C, <16 x float>* %A, <16 x float>* %B) nounwind {
>     %tmp1 = load <16 x float>* %A
>     %tmp2 = load <16 x float>* %B
>     %tmp3 = call <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float> %tmp1, <16 x float> %tmp2)
>     store <16 x float> %tmp3, <16 x float>* %C
>     ret void
> }
>
> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>, <16 x float>) nounwind readnone
>
> llc fails with following message:
>
> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250, 0x2258050, 0x2258150 [ORD=3] [ID=0]
>
> LLVM ERROR: Do not know how to split the result of this operator!
>
> Is it a BUG ? If yes I'm happy to get some directions on how I can fix it.

No... platform-specific intrinsics have platform-specific semantics,
including what types they're defined for. NEON doesn't have 16 x float
vectors, at least not for that sort of operation.

> If not I would like to know how to determine valid type for a given LLVM intrinsics.

The ARM reference manual is probably your best bet for ARM intrinsics.

-Eli