[PATCH] Unrolling improvements (target indep. and for x86)

Fri Feb 28 05:01:02 PST 2014

Hal,

I've tested the partial unrolling patch with our internal benchmarks.
It gives a nice ~5% improvement on one of them while keeping code size
almost identical.  The other benchmarks are unaffected.  On SPEC 2006,
the effects are in the noise, so all in all I would say that it is
safe to include.

One thing I did not do here is to make sure the machine I was building
on was the same flavour of x86 as the machine running the benchmark (I
think they were, however).  I did not use any -mtune options and
allowed ::getUnrollingPreferences to choose its max ops factor by
itself.

Question on the patch:

+  // Scan the loop: don't unroll loops with calls, and count
potential branches.
+  // FIXME: This branch check is not quite right because we should be counting
+  // the number of branches after unrolling, not before unrolling.

Isn't this a matter of estimating how many back branches you removed
and account for the body expansion with branches that will be added by
it?

How about allowing this by default on x86?  I'd like to experiment
with larger unroll factors for known-to-be-hot loops.

I am going over the other two patches today.

Diego.