[PATCH] [AArch64] Lower sdiv x, pow2 using add + select + shift.

Thu Jul 10 04:22:10 PDT 2014

Tim,

FWIW, my investigations with C-A53 and C-A57 have shown that the reasons for the new one being faster are entirely microarchitectural. So I wouldn't personally try to extrapolate performance onto any other ISA.

James

> -----Original Message-----
> From: Tim Northover [mailto:t.p.northover at gmail.com]
> Sent: 10 July 2014 12:11
> To: James Molloy
> Cc: Chad Rosier; James Molloy;
> reviews+D4438+public+a452d711668fdd91 at reviews.llvm.org; Chandler Carruth;
> Jiangning Liu; Jim Grosbach; LLVM Commits; silviu.baranga at gmail.com
> Subject: Re: [PATCH] [AArch64] Lower sdiv x, pow2 using add + select +
> shift.
> 
> > I’ve taken a look at the performance of that code sequence, and can
> confirm
> > that it is no worse in all situations than the current sequence. In some
> > situations it causes a ~5% performance uplift on A53, and in some cases
> a
> > ~20% performance uplift in A57 (on a microbenchmark running this
> sequence in
> > a loop).
> 
> I've run some very simplistic tests on Cyclone and it seems like it's
> helpful there too.
> 
> As for the code, that's a fairly nasty function to spot the idiom. It
> would be better if we could avoid forming the original pattern in the
> first place. What if we move the call to BuildSDIV above the pow2
> block in DAGCombiner::visitSDIV and override it on AArch64 to handle
> this specific case?
> 
> I'd actually be interested to know which sequence is better on other
> targets: the existing one seems to win on my Sandy Bridge x86, but I'm
> slightly suspicious of it in general. Never mind, we can ignore that
> for this patch, I think.
> 
> Cheers.
> 
> Tim.