[PATCH] D25966: [AArch64] Lower multiplication by a constant int to shl+add+shl
Haicheng Wu via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 29 09:08:02 PDT 2016
haicheng added a comment.
Thank you, Gerolf
In https://reviews.llvm.org/D25966#581742, @Gerolf wrote:
> Hi Haicheng,
>
> I just have a few observations/food for thought:
>
> - Nit: In your Summary I think you swapped n and m in your code snippets vs your formulas. Your code is correct though.
Thank you for catching this. I updated the summary.
> - The 2^N-1 * 2^M reduction increases code size, so it should not fire under Oz. Otherwise similar consideration as to your major case apply
> - The 2^N+1 * 2^M reduction increases schedule height (at least on most processors). It might also increase code when e.g. add+mul could be combined to madd. But when code size is *not* a concern and latency(lsl) + 1 < latency (mul), latency(madd) it should always be a win. But that target dependence is not checked in your code yet.
> - I would look at the machine combiner only for cases that need more global scheduling context to decide
I agree everything you said. I tried to be conservative in this patch to not increase code size or impact the generation of madd. If I want to support my cases, I think I need to check the target and compare the cost of different code sequences.
> Like Renato I'm also curious about your gains. How big? Which benchmarks?
Please see my response to Renato above.
Repository:
rL LLVM
https://reviews.llvm.org/D25966
More information about the llvm-commits
mailing list