[PATCH] D25966: [AArch64] Lower multiplication by a constant int to shl+add+shl

Sat Oct 29 09:08:02 PDT 2016

haicheng added a comment.

Thank you, Gerolf

In https://reviews.llvm.org/D25966#581742, @Gerolf wrote:

> Hi Haicheng,
>
> I just have a few observations/food for thought:
>
> - Nit: In your Summary I think you swapped n and m in your code snippets vs your formulas. Your code is correct though.

Thank you for catching this.  I updated the summary.

> - The 2^N-1 * 2^M reduction increases code size, so it should not fire under Oz. Otherwise similar consideration as to your major case apply
> - The 2^N+1 * 2^M reduction increases schedule height (at least on most processors). It might also increase code when e.g. add+mul could be combined to madd. But when code size is *not* a concern and latency(lsl) + 1 < latency (mul), latency(madd) it should always be a win. But that target dependence is not checked in your code yet.
> - I would look at the machine combiner only for cases that need more global scheduling context to decide

I agree everything you said.  I tried to be conservative in this patch to not increase code size or impact the generation of madd.  If I want to support my cases, I think I need to check the target and compare the cost of different code sequences.

> Like Renato I'm also curious about your gains. How big? Which benchmarks?

Please see my response to Renato above.

Repository:
  rL LLVM

https://reviews.llvm.org/D25966