[LLVMdev] Post-inc combining

Bob Wilson bob.wilson at apple.com
Mon Feb 7 10:09:43 PST 2011


On Feb 6, 2011, at 11:32 PM, Jonas Paulsson wrote:

> When I compile the following program (for ARM):
> 
>   for(i=0;i<n2;i+=n3)
>     {
>       s+=a[i];
>     }
> 
> , with GCC, I get the following loop body, with a post-modify load:
> 
> .L4:
>         add     r1, r1, r3
>         ldr     r4, [ip], r6
>         rsb     r5, r3, r1
>         cmp     r2, r5
>         add     r0, r0, r4
>         bgt     .L4
> 
> With LLVM, however, I get:
> 
> .LBB0_3:                                @ %for.body
>                                         @ =>This Inner Loop Header: Depth=1
>         add     r12, lr, r3
>         ldr     lr, [r0, lr, lsl #2]
>         add     r1, lr, r1
>         cmp     r12, r2
>         mov     lr, r12
>         blt     .LBB0_3
> 
> , which does not seem to be auto-incrementing, I think.

No, it's not using a post-increment load.  There are two separate requirements to make this happen:

* LSR (the loop strength reduce pass) needs to transform the loop so that the load address is a simple induction variable.

* The instruction selection needs to recognize the opportunity for folding the address increment into the load.

In this case, LSR is not doing the right thing.

> 
> I wonder what I should do to get loops auto-incing generally, for instance in this simple loop:
> 
>   for(i=0;i<256;i++)
>     {
>         s+=a[i];
>     }
> 
> , which now yields
> 
> .LBB0_1:                                @ %for.body
>                                         @ =>This Inner Loop Header: Depth=1
>         ldr     r3, [r0, r2]
>         add     r2, r2, #4
>         add     r1, r3, r1
>         cmp     r2, #1, 22      @ 1024
>         bne     .LBB0_1
> 
> , which uses r0 as base address with r2 as offset. On my target, it is much preferred  to use auto-inc in cases like this. I repeat my question, as I don't quite understand why the ldr/add is used by ARM here, instead of post-inc. I guess I would like the DAG combiner to work in cases like this, but it does not seem to do so.

Same issue.  The DAG combiner can't handle it because LSR didn't expose the load address as a simple induction variable.  E.G., if the code was something like:

  ldr r3, [r2]
  add r2, r2, #4

...then the DAG combiner could do something with it.

Feel free to file a bug report on these issues.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110207/6382e429/attachment.html>


More information about the llvm-dev mailing list