[LLVMdev] Simple NEON optimization

Bob Wilson bob.wilson at apple.com
Fri Nov 12 09:52:20 PST 2010


On Nov 12, 2010, at 7:23 AM, Renato Golin wrote:

> Hi folks, me again,
> 
> So, I want to implement a simple optimization in a NEON case I've seen
> these days, most as a matter of exercise, but it also simplifies (just
> a bit) the code generated.
> 
> The case is simple:
> 
>        uint32x2_t x, res;
>        res = vceq_u32(x, vcreate_u32(0));
> 
> This will generate the following code:
> 
>        ; zero d16
>        vmov.i32        d16, #0x0
>        ; load a into d17
>        movw    r0, :lower16:a
>        movt    r0, :upper16:a
>        vld1.32 {d17}, [r0]
>        ; compare two registers
>        vceq.i32        d17, d17, d16
> 
> But, because the vector is zero, and there is a NEON instruction to
> compare against an immediate zero (VCEQZ), we could combine the two
> instructions:
> 
>        ; load a into d17
>        movw    r0, :lower16:a
>        movt    r0, :upper16:a
>        vld1.32 {d17}, [r0]
>        ; compare two registers
>        vceq.i32        d17, d17, #0
> 
> thus, saving the VMOV.
> 
> I know, it's not much, but it's a good start for me to get the hand of
> writing such passes.

This would be a nice optimization, and it's not a bad place to get started down in the depths of llvm codegen....

> 
> So, should I put this as a special case in NEON lowering or make it as
> part of an optimization pass? Which classes should I look first?

I recommend implementing this as a target-specific DAG combine optimization.  We already have target-specific DAG nodes for the relevant NEON comparison operations (ARMISD::VCEQ, etc. -- see ARMISelLowering.h) as well as the vmov (ARMISD::VMOVIMM).  You just need to teach the DAG combiner how to fold them together.  Here's what you need to do (all of this code is in ARMISelLowering.cpp):

0. (You don't actually need to do anything, but I'm just mentioning it FYI.) For selection DAG nodes that are not target-specific, you need to inform the DAG combiner that you want to do some target-specific combining.  Look for calls to setTargetDAGCombine() for examples of this.  For this case, the relevant nodes are all target-specific, so the DAG combiner will call the target-specific combining hook anyway.

1. Add the ARMISD::VCEQ etc. nodes to the switch in ARMTargetLowering::PerformDAGCombine.

2. Add a function to be called for those comparisons that checks if one operand is an ARMISD::VMOVIMM node with an immediate of zero.  Note for future reference that the actual operand of VMOVIMM is an encoded value that represents one of the possible vector immediates for the "one register plus a modified immediate" format.  In this case it doesn't matter because the canonical encoding of a zero vector is just zero.  When you find that case, use DAG.getNode() to return a new node for the compare against zero operation.  The PerformShiftCombine function is a fairly simple example of what needs to be done (although it's doing a completely different combination).

3. Write a testcase and make sure it works.

Thanks for offering to work on this!






More information about the llvm-dev mailing list