[PATCH] [AArch64] Lower sdiv x, pow2 using add + select + shift.
James Molloy
james.molloy at arm.com
Wed Jul 9 10:20:23 PDT 2014
Hi Chad,
This is interesting. I suppose this is relying on the fact that X is >= 0 much more often than it is negative?
I can't think of why this sequence would be faster otherwise - the csel is resolvable to nothing as soon as X is known (if X >= 0).
Extending this, it seems not improbable that a sequence involving a branch instead of a select would be even faster on OoO cores as it would allow the branch to resolve as soon as X is known:
> add w0, X, 15
> cmp X, wzr
> b.lt 2f
> 1:
> ... continue basic block
> ... end basic block
>
> 2:
> mov X, w0
> b 1b
Have you tried generating such a sequence? What core did you measure the speedup on - A53, A57 or another?
Cheers,
James
http://reviews.llvm.org/D4438
More information about the llvm-commits
mailing list