[LLVMdev] Question about LLVM NEON intrinsics

Fri Sep 21 09:54:09 PDT 2012

On Sep 21, 2012, at 2:58 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote:

> Hi Eli,
> 
> Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ?
> Or shall I define my own intrinsics for non supported types ? 

You should never generate these sorts of intrinsics with non-legal types. It's the job of the front end to make sure that they are only called with legal types. Yes, this is different than normal LLVM IR.

> 
> Best Regards
> Seb
> ________________________________________
> De : Eli Friedman [eli.friedman at gmail.com]
> Date d'envoi : vendredi 21 septembre 2012 11:54
> À : Sebastien DELDON-GNB
> Cc : llvmdev at cs.uiuc.edu
> Objet : Re: [LLVMdev] Question about LLVM NEON intrinsics
> 
> On Fri, Sep 21, 2012 at 1:28 AM, Sebastien DELDON-GNB
> <sebastien.deldon at st.com> wrote:
>> Hi all,
>> 
>> I would like to know if LLVM Neon intrinsics are designed to support only 'Legal' types for NEON units.
>> Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll code:
>> 
>> 
>> ; ModuleID = 'vmax.ll'
>> target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
>> target triple = "armv7-none-linux-androideabi"
>> 
>> define void @vmaxf32(<4 x float> *%C, <4 x float>* %A, <4 x float>* %B) nounwind {
>>    %tmp1 = load <4 x float>* %A
>>    %tmp2 = load <4 x float>* %B
>>    %tmp3 = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float> %tmp1, <4 x float> %tmp2)
>>    store <4 x float> %tmp3, <4 x float>* %C
>>    ret void
>> }
>> 
>> declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>, <4 x float>) nounwind readnone
>> 
>> I've got following code generated:
>> 
>> ...
>> vmaxf32:                                @ @vmaxf32
>> @ BB#0:
>>        vld1.64 {d16, d17}, [r2]
>>        vld1.64 {d18, d19}, [r1]
>>        vmax.f32        q8, q9, q8
>>        vst1.64 {d16, d17}, [r0]
>>        bx      lr
>> ...
>> 
>> Now if use <16 x float> vectors instead of <4 x float>:
>> 
>> define void @vmaxf32(<16 x float> *%C, <16 x float>* %A, <16 x float>* %B) nounwind {
>>    %tmp1 = load <16 x float>* %A
>>    %tmp2 = load <16 x float>* %B
>>    %tmp3 = call <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float> %tmp1, <16 x float> %tmp2)
>>    store <16 x float> %tmp3, <16 x float>* %C
>>    ret void
>> }
>> 
>> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>, <16 x float>) nounwind readnone
>> 
>> llc fails with following message:
>> 
>> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250, 0x2258050, 0x2258150 [ORD=3] [ID=0]
>> 
>> LLVM ERROR: Do not know how to split the result of this operator!
>> 
>> Is it a BUG ? If yes I'm happy to get some directions on how I can fix it.
> 
> No... platform-specific intrinsics have platform-specific semantics,
> including what types they're defined for. NEON doesn't have 16 x float
> vectors, at least not for that sort of operation.
> 
>> If not I would like to know how to determine valid type for a given LLVM intrinsics.
> 
> The ARM reference manual is probably your best bet for ARM intrinsics.
> 
> -Eli
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev