[LLVMdev] RE : Question about LLVM NEON intrinsics

Sebastien DELDON-GNB sebastien.deldon at st.com
Fri Sep 21 02:57:13 PDT 2012

Hello Renato,

You're pointing me at ARM intrinsics related to loads, problem that I've reported in original e-mail, is not support for vector loads, but support for 'vmaxs'. For instance, there is no vector loads of 16 floats in ARM ISA but it is legal to write in LLVM:

; ModuleID = 'vadd.ll'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32"
target triple = "armv7-none-linux-androideabi"

define void @vaddf32(<16 x float> *%C, <16 x float>* %A, <16 x float>* %B) nounwind {
    %tmp1 = load <16 x float>* %A
    %tmp2 = load <16 x float>* %B
    %tmp3 = fadd <16 x float> %tmp1, %tmp2
    store <16 x float> %tmp3, <16 x float>* %C
    ret void

and llc generates following code:

vaddf32:                                @ @vaddf32
@ BB#0:
	add	r12, r1, #48
	add	r3, r2, #32
	vld1.64	{d20, d21}, [r3, :128]
	add	r3, r2, #48
	vld1.64	{d16, d17}, [r2, :128]
	add	r2, r2, #16
	vld1.64	{d18, d19}, [r1, :128]
	vld1.64	{d26, d27}, [r12, :128]
	add	r12, r1, #32
	vld1.64	{d24, d25}, [r3, :128]
	add	r1, r1, #16
	vadd.f32	q11, q9, q8
	vld1.64	{d28, d29}, [r12, :128]
	vadd.f32	q9, q13, q12
	vadd.f32	q8, q14, q10
	vld1.64	{d20, d21}, [r2, :128]
	vld1.64	{d24, d25}, [r1, :128]
	add	r1, r0, #48
	vadd.f32	q10, q12, q10
	vst1.64	{d22, d23}, [r0, :128]
	vst1.64	{d18, d19}, [r1, :128]
	add	r1, r0, #32
	add	r0, r0, #16
	vst1.64	{d16, d17}, [r1, :128]
	vst1.64	{d20, d21}, [r0, :128]
	bx	lr
	.size	vaddf32, .Ltmp0-vadd32

So 'fadd' instruction operating on vector of <16 x float> is legalized (scalarized) into 4 vadd.f32 instructions. My assumption was that same process could apply to NEON LLVM intrinsics such as 'vmaxs'.  It doesn't seems to be the case so I'm wondering if this is an actual bug or if LLVM intrinsics are limited to legal types for the targeted architecture.
Note that however <16 x float> loads are not supported LLVM is able to generate them as a serie of vld1.i64 instructions.
Hope this clarify my request.

Best Regards

De : rengolin at gmail.com [rengolin at gmail.com] de la part de Renato Golin [rengolin at systemcall.org]
Date d'envoi : vendredi 21 septembre 2012 11:14
À : Sebastien DELDON-GNB
Cc : llvmdev at cs.uiuc.edu
Objet : Re: [LLVMdev] Question about LLVM NEON intrinsics

On 21 September 2012 09:28, Sebastien DELDON-GNB
<sebastien.deldon at st.com> wrote:
> declare <16 x float> @llvm.arm.neon.vmaxs.v16f32(<16 x float>, <16 x float>) nounwind readnone
> llc fails with following message:
> SplitVectorResult #0: 0x2258350: v16f32 = llvm.arm.neon.vmaxs 0x2258250, 0x2258050, 0x2258150 [ORD=3] [ID=0]
> LLVM ERROR: Do not know how to split the result of this operator!
> Is it a BUG ? If yes I'm happy to get some directions on how I can fix it. If not I would like to know how to determine valid type for a given LLVM intrinsics.

I may be wrong, but I don't think there is such a load intrinsic...




More information about the llvm-dev mailing list