[LLVMdev] Simple NEON optimization

Fri Nov 12 07:23:37 PST 2010

Hi folks, me again,

So, I want to implement a simple optimization in a NEON case I've seen
these days, most as a matter of exercise, but it also simplifies (just
a bit) the code generated.

The case is simple:

        uint32x2_t x, res;
        res = vceq_u32(x, vcreate_u32(0));

This will generate the following code:

        ; zero d16
        vmov.i32        d16, #0x0
        ; load a into d17
        movw    r0, :lower16:a
        movt    r0, :upper16:a
        vld1.32 {d17}, [r0]
        ; compare two registers
        vceq.i32        d17, d17, d16

But, because the vector is zero, and there is a NEON instruction to
compare against an immediate zero (VCEQZ), we could combine the two
instructions:

        ; load a into d17
        movw    r0, :lower16:a
        movt    r0, :upper16:a
        vld1.32 {d17}, [r0]
        ; compare two registers
        vceq.i32        d17, d17, #0

thus, saving the VMOV.

I know, it's not much, but it's a good start for me to get the hand of
writing such passes.

So, should I put this as a special case in NEON lowering or make it as
part of an optimization pass? Which classes should I look first?

-- 
cheers,
--renato