[PATCH] [AArch64] Lower sdiv x, pow2 using add + select + shift.

Wed Jul 9 10:20:23 PDT 2014

Hi Chad,

This is interesting. I suppose this is relying on the fact that X is >= 0 much more often than it is negative?

I can't think of why this sequence would be faster otherwise - the csel is resolvable to nothing as soon as X is known (if X >= 0).

Extending this, it seems not improbable that a sequence involving a branch instead of a select would be even faster on OoO cores as it would allow the branch to resolve as soon as X is known:

> add w0, X, 15
> cmp X, wzr
> b.lt 2f
> 1:
> ... continue basic block
> ... end basic block
> 
> 2:
> mov X, w0
> b 1b

Have you tried generating such a sequence? What core did you measure the speedup on - A53, A57 or another?

Cheers,

James

http://reviews.llvm.org/D4438