[LLVMdev] Simple NEON optimization
Renato Golin
renato.golin at arm.com
Fri Nov 12 07:23:37 PST 2010
Hi folks, me again,
So, I want to implement a simple optimization in a NEON case I've seen
these days, most as a matter of exercise, but it also simplifies (just
a bit) the code generated.
The case is simple:
uint32x2_t x, res;
res = vceq_u32(x, vcreate_u32(0));
This will generate the following code:
; zero d16
vmov.i32 d16, #0x0
; load a into d17
movw r0, :lower16:a
movt r0, :upper16:a
vld1.32 {d17}, [r0]
; compare two registers
vceq.i32 d17, d17, d16
But, because the vector is zero, and there is a NEON instruction to
compare against an immediate zero (VCEQZ), we could combine the two
instructions:
; load a into d17
movw r0, :lower16:a
movt r0, :upper16:a
vld1.32 {d17}, [r0]
; compare two registers
vceq.i32 d17, d17, #0
thus, saving the VMOV.
I know, it's not much, but it's a good start for me to get the hand of
writing such passes.
So, should I put this as a special case in NEON lowering or make it as
part of an optimization pass? Which classes should I look first?
--
cheers,
--renato
More information about the llvm-dev
mailing list