[PATCH] Unrolling improvements (target indep. and for x86)

Sat Mar 1 00:58:02 PST 2014

----- Original Message -----
> From: "Diego Novillo" <dnovillo at google.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-commits" <llvm-commits at cs.uiuc.edu>, "Chandler Carruth" <chandlerc at google.com>, "Nadav Rotem"
> <nrotem at apple.com>
> Sent: Friday, February 28, 2014 7:01:02 AM
> Subject: Re: [PATCH] Unrolling improvements (target indep. and for x86)
> 
> Hal,
> 
> I've tested the partial unrolling patch with our internal benchmarks.
> It gives a nice ~5% improvement on one of them while keeping code
> size
> almost identical.  The other benchmarks are unaffected.  On SPEC
> 2006,
> the effects are in the noise, so all in all I would say that it is
> safe to include.
> 
> One thing I did not do here is to make sure the machine I was
> building
> on was the same flavour of x86 as the machine running the benchmark
> (I
> think they were, however).  I did not use any -mtune options and
> allowed ::getUnrollingPreferences to choose its max ops factor by
> itself.
> 
> Question on the patch:
> 
> +  // Scan the loop: don't unroll loops with calls, and count
> potential branches.
> +  // FIXME: This branch check is not quite right because we should
> be counting
> +  // the number of branches after unrolling, not before unrolling.
> 
> Isn't this a matter of estimating how many back branches you removed
> and account for the body expansion with branches that will be added
> by
> it?

Yes; the correct way to do this is to pass the limit to the unroller; we can't really estimate it here (except for being over conservative). The docs say that the limit is on the number of "taken" branches in the buffer, and so we really want to do some tree depth calculation. I'm unsure how much that matters for such small loops.

> 
> How about allowing this by default on x86?  I'd like to experiment
> with larger unroll factors for known-to-be-hot loops.

You mean allowing this for all target cpus?

Thanks again,
Hal

> 
> I am going over the other two patches today.
> 
> 
> Diego.
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory