[PATCH] Unrolling improvements (target indep. and for x86)

Tue Apr 1 12:23:48 PDT 2014

----- Original Message -----
> From: "Hal Finkel" <hfinkel at anl.gov>
> To: "Chandler Carruth" <chandlerc at google.com>
> Cc: "Diego Novillo" <dnovillo at google.com>, "llvm-commits" <llvm-commits at cs.uiuc.edu>, "Nadav Rotem"
> <nrotem at apple.com>
> Sent: Monday, March 31, 2014 6:43:33 PM
> Subject: Re: [PATCH] Unrolling improvements (target indep. and for x86)
> 
> ----- Original Message -----
> > From: "Chandler Carruth" <chandlerc at google.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: "Diego Novillo" <dnovillo at google.com>, "llvm-commits"
> > <llvm-commits at cs.uiuc.edu>, "Nadav Rotem"
> > <nrotem at apple.com>
> > Sent: Thursday, March 6, 2014 1:42:38 PM
> > Subject: Re: [PATCH] Unrolling improvements (target indep. and for
> > x86)
> > 
> > 
> > 
> > 
> > 
> > On Mon, Mar 3, 2014 at 2:54 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> > 
> > 
> > However, Chandler had made a similar change to the loop vectorizer
> > (to prefer power-of-2 unrolling factors), because it helps with X86
> > addressing modes, and as a result, I wonder whether it is worth
> > keeping this part regardless. Chandler?
> > To try and answer this specifc question first:
> > 
> > 
> > The power-of-two thing is an interesting and pesky issue. I suspect
> > that it matters more for "widening" style unrolls than for this
> > version. Consider that in the widening version, we need to have all
> > N unroll-step pointers live at the same time (potentially), whereas
> > here we can just lea them however we like on each unrolled
> > iteration.
> > 
> > 
> > So I wouldn't include it until we have benchmarks showing it
> > matters.
> > 
> > 
> > 
> > 
> > Regarding the entire patch series, please at least commit
> > everything
> > but the gep-cost-metric change at least. I'm more hesitant there
> > and
> > would like to stare at the patch a bit.
> 
> I committed the optimization pipeline change in r205264. Next I'll
> commit the change to set the x86 unrolling preferences, but not
> until later tonight or tomorrow.

I committed the remaining (non-gep-cost-metric-change) bits in r205347/205348 (with some other necessary changes to prevent changing the thresholds for full unrolling). I apologize for the delay. The committed version resolves (to an extent that satisfies me for this initial implementation) the FIXME re: branch counting. Also, in response to Diego's review comment, I've adjusted the way that the command-line arguments are handled to make them more-useful for testing on all cores. There was one independently-useful piece of refactoring in the gep-cost-metric-change that I committed in r205346.

Thanks again,
Hal

> 
>  -Hal
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory